Text
                    Random Graphs
The book is devoted to the study of classical combinatorial structures, such as ran-
random graphs, permutations, and systems of random linear equations in finite fields.
The author shows how the application of the generalized scheme of allocation
in the study of random graphs and permutations reduces the combinatorial prob-
problems to classical problems of probability theory on the summation of independent
random variables. He concentrates on recent research by Russian mathematicians,
including a discussion of equations containing an unknown permutation. This is the
first English-language presentation of techniques for analyzing systems of random
linear equations in finite fields.
These results will interest specialists in combinatorics and probability theory
and will also be useful in applied areas of probabilistic combinatorics, such as
communication theory, cryptology, and mathematical genetics.
V. F. Kolchin is a leading researcher at the Steklov Institute and a professor at the
Moscow Institute of Electronics and Mathematics (MIEM). He has written four
books and many papers in the area of probabilistic combinatorics. His papers have
been published mainly in the Russian journals Theory of Probability audits Appli-
Applications, Mathematical Notes, and Discrete Mathematics, and in the international
journal Random Structures and Algorithms.


ENCYCLOPEDIA OF MATHEMATICS AND ITS APPLICATIONS EDITED BY G.-C. ROTA Editorial Board R. Doran, M. Ismail, T.-Y. Lam, E. Lutwak Volume 53 Random Graphs 6 H. Mine Permanents 18 H. O. Fattorini The Cauchy Problem 19 G. G. Lorentz, K. Jetter, and S. D. Riemenschneider Birkhoff Interpolation 22 J. R. Bastida Field Extensions and Galois Theory 23 J. R. Cannon The One-Dimensional Heat Equation 24 S. Wagon The Banach-Tarski Paradox 25 A. Salomaa Computation and Automata 26 N. White (ed.) Theory ofMatroids 27 N. H. Bingham, C. M. Goldie, and J. L. Teugels Regular Variation 28 P. P. Petrushev and V. A. Popov Rational Approximation of Real Functions 29 N. White (ed.) Combinatorial Geometries 30 M. Pohst and H. Zassenhaus Algorithmic Algebraic Number Theory 31 J. Aczel and J. Dhombres Functional Equations in Several Variables 32 M. Kuczma, B. Chozewski, and R. Ger Iterative Functional Equations 33 R. V. Ambartzumian Factorization Calculus and Geometric Probability 34 G. Gripenberg, S.-O. Londen, and O. Staffans Volterra Integral and Functional Equations 35 G. Gasper and M. Rahman Basic Hypergeometric Series 36 E. Torgersen Comparison of Statistical Experiments 37 A. Neumaier Interval Methods for Systems of Equations 38 N. Korneichuk Exact Constants in Approximation Theory 39 R. A. Brualdi and H. J. Ryser Combinatorial Matrix Theory 40 N. White (ed.) Matroid Applications 41 S. Sakai Operator Algebras in Dynamical Systems 42 W Hodges Basic Model Theory 43 H. Stahl and V. Totik General Orthogonal Polynomials 44 R. Schneider Convex Bodies 45 G. Da Prato and J. Zabczyk Stochastic Equations in Infinite Dimensions 46 A. Bjorner, M. Las Vergnas, B. Sturmfels, N. White, and G. Ziegler Oriented Matroids 47 G. A. Edgar and L. Sucheston Stopping Times and Directed Processes 48 C. Sims Computation with Finitely Presented Groups 49 T. Palmer Banach Algebras and the General Theory of *-Algebras 50 F. Borceux Handbook of Categorical Algebra I 51 F. Borceux Handbook of Categorical Algebra II 52 F. Borceux Handbook of Categorical Algebra III
ENCYCLOPEDIA OF MATHEMATICS AND ITS APPLICATIONS Random Graphs V. F. KOLCHIN Steklov Mathematical Institute, Moscow Cambridge UNIVERSITY PRESS
PUBLISHED BY THE PRESS SYNDICATE OF THE UNIVERSITY OF CAMBRIDGE The Pitt Building, Trumpington Street, Cambridge CB2 1RP, United Kingdom CAMBRIDGE UNIVERSITY PRESS The Edinburgh Building, Cambridge CB2 2RU, UK http://www.cup.cam.ac.uk 40 West 20th Street, New York, NY 10011-4211, USA http://www.cup.org 10 Stamford Road, Oakleigh, Melbourne 3166, Australia © Cambridge University Press 1999 This book is in copyright. Subject to statutory exception and to the provisions of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press. First published 1999 Printed in the United States of America Typeface Times Roman 10/13 pt. System I^TgX [RW] A catalog record of this book is available from the British Library Library of Congress Cataloging in Publication data Kolchin, V. F. (Valentin Fedorovich) Random graphs / V. F. Kolchin p. cm. - (Encyclopedia of mathematics and its applications; v. 53) Includes bibliographical references and index. ISBN 0 521 44081 5 hardback 1. Random graphs. I. Title. II. Series. QA166.17.K65 1999 98-24390 511'.5-dc20 CIP ISBN 0 521 44081 5 hardback
CONTENTS Preface ix 1 The generalized scheme of allocation and the components of random graphs 1 1.1 The probabilistic approach to enumerative combinatorial problems 1 1.2 The generalized scheme of allocation 14 1.3 Connectivity of graphs and the generalized scheme 22 1.4 Forests of nonrooted trees 30 1.5 Trees of given sizes in a random forest 42 1.6 Maximum size of trees in a random forest 48 1.7 Graphs with unicyclic components 58 1.8 Graphs with components of two types 70 1.9 Notes and references 86 2 Evolution of random graphs 91 2.1 Subcritical graphs 91 2.2 Critical graphs 97 2.3 Random graphs with independent edges 100 2.4 Nonequiprobable graphs 109 2.5 Notes and references 120 3 Systems of random linear equations in GFB) 122 3.1 Rank of a matrix and critical sets 122 3.2 Matrices with independent elements 126 3.3 Rank of sparse matrices 135 3.4 Cycles and consistency of systems of random equations 143 3.5 Hypercycles and consistency of systems of random equations 156 vn
viii Contents 3.6 Reconstructing the true solution 164 3.7 Notes and references 177 4 Random permutations 181 4.1 Random permutations and the generalized scheme of allocation 181 4.2 The number of cycles 183 4.3 Permutations with restrictions on cycle lengths 192 4.4 Notes and references 212 5 Equations containing an unknown permutation 219 5.1 A quadratic equation 219 5.2 Equations of prime degree 225 5.3 Equations of compound degree 235 5.4 Notes and references 239 Bibliography 241 Index 251
PREFACE Combinatorics played an important role in the development of probability theory and the two have continued to be closely related. Now probability theory, by offering new approaches to problems of discrete mathematics, is beginning to repay its debt to combinatorics. Among these new approaches, the methods of asymptotic analysis, which have been well developed in probability theory, can be used to solve certain complicated combinatorial problems. If the uniform distribution is defined on the set of combinatorial structures in question, then the numerical characteristics of the structures can be regarded as random variables and analyzed by probabilistic methods. By using the probabilistic approach, we restrict our attention to "typical" structures that constitute the bulk of the set, excluding the small fraction with exceptional properties. The probabilistic approach that is now widely used in combinatorics was first formulated by V. L. Goncharov, who applied it to Sn, the set of all permuta- permutations of degree n, and to the runs in random (O,l)-sequences. S. N. Bernstein, N. V. Smirnov, and V. E. Stepanov were among those who developed probabilis- probabilistic combinatorics in Russia, building on the famous Russian school of probability founded by A. A. Markov, P. L. Lyapunov, A. Ya. Khinchin, and A. N. Kolmogorov. This book is based on results obtained primarily by Russian mathematicians and presents results on random graphs, systems of random linear equations in GFB), random permutations, and some simple equations involving permutations. Selecting material for the book was a difficult job. Of course, this book is not a complete treatment of the topics mentioned. Some results (and their proofs) did not seem ready for inclusion in a book, and there may be relevant results that have escaped the author's attention. There is a large body of literature on random graphs, and it is not possible to re- review it here. Among the probabilistic tools that have been used to analyze random structures are the method of moments, Poisson and Gaussian approximations, gen- generating functions using the saddle-point method, Tauberian-type theorems, analysis IX
x Preface of singularities, and martingale theory. In the past two decades, a method called the generalized scheme of allocation has been widely used in probabilistic com- combinatorics. It is so named because of its connection with the problem of assigning n objects randomly to TV cells. Let 771,..., r]N be random variables that are, for example, the sizes of components of a graph. If there are independent random variables ?1,..., %n so that the joint distribution of 771,..., 77at for any integers k\,... ,Icn can be written as =h,...,r]N = kN} = P{?i = k\,..., l;N = kN I ?1 H \-%N = n}, where n is a positive integer, then we say that 771,..., 77at satisfy the generalized scheme of allocation with parameters n and N and independent random variables ?1, • • •,?#• Graph evolution is the random process of sequentially adding new edges to a graph. For many classes of random graphs with n labeled vertices and T edges, the parameter 0 = 2T/n plays a role of time in the process; various graph properties often change abruptly at the critical point 0 = 1. Graph evolution is the most fascinating object in the theory of random graphs, and it appears that it is well suited to the generalized scheme. We will show that applying generalized schemes makes it possible to analyze random graphs at different stages of their evolution and to obtain limit distributions in those cases in which only properties similar to the law of large numbers have been proved. The theory of random equations in finite fields is shared by probability, combi- combinatorics, and algebra. In this book, we will consider systems of linear equations in GFB) with random coefficients. The matrix of such a system corresponds to a ran- random graph or hypergraph; therefore, results on random graphs help to study these systems. We are sure that this application alone justifies developing the theory of random graphs. The theory of random permutations is a well-developed branch of probabilis- probabilistic combinatorics. Although Goncharov has investigated the cycle structure of a random permutation in great detail, there is still great interest in this area. We will fully describe the asymptotic behavior of P{vn = k) for the total number vn of cycles in a random permutation for all possible behaviors of the parameters n and k = k(n) as n -> 00. We will also give some of the asymptotic results for the number of solutions of the equation Xd = e, where an unknown X e Sn,d\sa fixed positive integer, and e is the identity of the group Sn. Although the generalized scheme of allocation cannot be applied to nonequi- probable graphs, we present some results in this situation by using the method of moments. The statistical applications of nonequiprobable graphs call for the development of regular methods of analyzing these structures. The book consists of five chapters. Chapter 1 describes the generalized scheme of allocation and its applications to a random forest of nonrooted trees, a random
Preface xi graph consisting of unicyclic components, and a random graph with a mixture of trees and unicyclic components. In Chapter 2, these results are applied to the study of the evolution of random graphs. Chapter 3 is devoted to systems of random linear equations in GFB). Much of this branch of probabilistic combinatorics is the work of Russian mathematicians; this is the first English-language presentation of many of the results. Random permutations are considered in Chapter 4, and Chapter 5 contains some results on permutation equations of the form Xd = e. Most results presented in this book derive from work done over the past fifteen years; notes and references can be found in the last section of each chapter. (It is, of course, impossible to give a complete list in each particular area.) In addition to articles used in the text, the summary sections of all chapters include references to ) papers on related topics, especially those in which the same results were obtained \ by other methods. j We assume that the reader is familiar with basic combinatorics. This book ' should be accessible to those who have completed standard courses of mathemat- , ical analysis and probability theory. Section 1.1 includes a list of pertinent results from probability. This book continues in the tradition of Random Mappings [78] and differs from other treatments of random graphs in the systematic use of the generalized scheme of allocation. We hope that the chapter on systems of random linear equations in GFB) will be of interest to a broad audience. I wish to express my sincere appre- appreciation to G.-C. Rota, who encouraged me to write this book for the Encyclopedia of Mathematics series, even though there are already several excellent books on random graphs. My greatest concern is writing the book in English. I am indebted to the editors who have brought the text to an acceptable form. It is apparent that no amount of editing can erase the heavy Russian accent of my written English, so my special thanks go to those readers who will not be deterred by the language of the book. I greatly appreciate the support I received from my colleagues at the Steklov Mathematical Institute while I wrote this book. I I
The generalized scheme of allocation and the components of random graphs 1.1. The probabilistic approach to enumerative combinatorial problems The solution to enumerative combinatorial problems consists in finding an exact or approximate expression for the number of combinatorial objects possessing the property under investigation. In this book, the probabilistic approach to enumera- enumerative combinatorial problems is adopted. The fundamental notion of probability theory is the probability space (Q ,A,P), where Q is a set of arbitrary elements, A is a set of subsets of Q forming a a- algebra of events with the operations of union and intersection of sets, and P is a nonnegative countably additive function defined for each event A e A so that P(Q) = 1. The set ?2 is called the space of elementary events and P is a probability. A random variable is a real-valued measurable function ? = ?(a>) defined for all oo e ?1. Suppose ?2 consists of finitely many elements. Then the probability P is defined on all subsets of Q, if it is defined for each elementary event oo e ?2. In this case, any real-valued function ? = ?(a>) on such a space of elementary events is a random variable. Instead of a real-valued function, one may consider a function f(co) taking values from some set Y of arbitrary elements. Such a function f{oo) may be con- considered a generalization of a random variable and is called a random element of the set Y. In studying combinatorial objects, we consider probability spaces that have a natural combinatorial interpretation: For the space of elementary events Q, we take the set of combinatorial objects under investigation and assign the same probability to all the elements of the set. In this case, numerical characteristics of combinatorial objects of Q, become random variables. The term "random element of the set Q" is usually used for the identity function f(<o) = a), co e Q, mapping each element of the set of combinatorial objects into itself. Since the uniform distribution is
2 The generalized scheme of allocation and the components of random graphs assumed on Q, the probability that the identity function / takes any fixed value cd is the same for all co e Q. Hence the notion of a random combinatorial object of ?2, such as the identity function f(w) = co, agrees with the usual notion of a random element of a set as an element sampled from all elements of the set with equal probabilities. Note that a random combinatorial object with the same distribution could also be defined on larger probability spaces. For our purposes, however, the natural construction presented here is sufficient for the most part. The exceptions are those few cases that involve several independent random combinatorial objects and in which it would be necessary to resort to a richer probability space, such as the direct product of the natural probability spaces. Since we use probability spaces with uniform distributions, in spite of the proba- probabilistic terminology, the problems considered are in essence enumeration problems of combinatorial analysis. The probabilistic approach furnishes a convenient form of representation and helps us effectively use the methods of asymptotic analysis that have been well developed in the theory of probability. Thus, in the probabilistic approach, numerical characteristics of a random com- combinatorial object are random variables. The main characteristic of a random variable ? is its distribution function F(x) defined for any real x as the probability of the event {? < x}, that is, The distribution function F(x) defines a probability distribution on the real line called the distribution of the random variable ?. With respect to this distribution, given a function g(x), the Lebesgue-Stieltjes integral /•OO / g(x)dF(x) J — OO can be defined. The probabilistic approach has advantages in the asymptotic in- investigations of combinatorial problems. As a rule, we have a sequence of random variables %n,n = 1,2,..., each of which describes a characteristic of the random combinatorial object under consideration, and we are interested in the asymptotic behavior of the distribution functions Fn(x) = P{^n < x} as n -> oo. A sequence of distributions with distribution functions Fn (jc) converges weakly to a distribution with the distribution function F{x) if, for any bounded continuous function g(x), /•OO /.OO / g(x)dFn(x)^ / g(x)dF(x) J—oo J—oo as n -> oo. The weak convergence of distributions is directly connected with the pointwise convergence of the distribution functions as follows.
1.1 Probabilistic approach to enumerative combinatorial problems 3 Theorem 1.1.1. A sequence of distribution functions Fn(x) converges to a dis- distribution function F(x) at all continuity points if and only if the corresponding sequence of distributions converges weakly to the distribution with distribution function F(x). In a sense, the distribution, or the distribution function F(x), characterizes the random variable ?. The moments of ? are simple characteristics. If /•OO / \x\dF(x) J -OO exists, then /•OO E? = / xdF(x) J-oo is called the mathematical expectation, or mean, of the random variable ?. Further, /•OO mr = E^r = / xr dF{x) J — OO is called the rth moment, or the moment of rth order (if the integral of |jc|r exists). In probabilistic combinatorics, one usually considers nonnegative integer- valued random variables. For such a random variable, the factorial moments are natural characteristics. We denote the rth factorial moment by If a distribution function F(x) can be represented in the form F(x)= f p(u)du, J — OO where p{u) > 0, then we say that the distribution has a density p{u). In addition to the distribution function, it is convenient to represent the distribution of an integer- valued random variable ? by the probabilities of its individual values. For ?, we will use the notation = k], A: = 0,1,..., and for integer-valued nonnegative random variables ?„, p{kn) = P{t;n=k}, k = 0,1,.... It is clear that OO «=0 if this series converges. It is not difficult to see that the following assertion is true.
4 The generalized scheme of allocation and the components of random graphs Theorem 1.1.2. A sequence of distributions {pk" }, n = 1,2,..., converges weakly to a distribution \pk) if and only if for every fixed k = 1,2,..., («) Pk ~+ Pk as n —> oo. If an estimate of the probability P{? > 0} is needed for a nonnegative integer- valued random variable ?, then the simple inequality OO 00 P{? > 0} = ?>{? = k} < J^kpk = E? A.1.1) k=l k=l can be useful. In particular, for a sequence ?„, n = 1, 2,..., of such random variables with E%n —> 0 as n —> oo, it follows that P{&, > 0} -> 0. Since it is generally easier to calculate the moments of a random variable than the whole distribution, one wants a criterion for the convergence of a sequence of distributions based on the corresponding moments. But, first, it should be noted that even if a random variable ? has moments of all orders, its distribution cannot, in general, be reconstructed on the basis of these moments, since there exist distinct distributions that have the same sequences of moments. For example, it is not difficult to confirm that for any n = 1, 2, ..., •oo xne~l/4smxl/4dx=0. '0 Hence, for — 1 < a < 1, the function Jo is the density of a distribution on [0, oo) whose moments do not depend on a. Thus the distribution functions with moments of all orders are divided into two classes: The first class contains the functions that may be uniquely reconstructed from their moments, and the second class contains the functions that cannot be reconstructed from their moments. There are several sufficient conditions for the moment problem to have a unique solution. Let /•OO Mn= \x\ndF(x). J — oo A distribution function F(x) is uniquely reconstructed by the sequence mr, r = 1, 2, ..., of its moments if there exists A. such that -Mln/n <k. A.1.2) n
1.1 Probabilistic approach to enumerative combinatorial problems 5 The following theorem describing the so-called method of moments is applicable only to the first class of distribution functions. Theorem 1.1.3. If distribution functions Fn(x), n = 1, 2, ..., have the moments of all orders and for any fixed r = 1,2,..., /•OO m^ = I xrdFn(x) —> mr, \mr\ < oo, /-oo as n —> oo, then there exists a distribution function F(x) such that for any fixed r= 1,2,..., /•OO J—oo •00 mr = I xr dF(x), -oo and from the sequence Fn(x), n = 1, 2, ..., it is possible to select a subsequence Fnk(x), k = 1,2,..., that converges to F(x) as n —> oo at every continuity point ofF(x). If the sequence mr, r = 1, 2, ..., uniquely determines the distribution function F(x), then Fn(x) —> F(x) as n —> oo at every continuity point of F(x). Note that the normal (Gaussian) and Poisson distributions are uniquely recon- structible by their moments. To use the method of moments, it is necessary to calculate moments of random variables. One useful method of calculating moments of integer-valued random variables is to represent them as sums of random variables that take only the values Oandl. Theorem 1.1.4. If Sn =?!+•••+?„, and the random variables ?i, ...,?« take only the values 0 and 1, then for any m = 1,2, ... ,n, Sn(Sn~\)---(Sn-m + \)= where the summation is taken over all different ordered sets of different indices {il, .. ., im}, the number of which is equal to (^)w!. Generating functions also provide a useful tool for solving many problems related to distributions of nonnegative integer-valued random variables. The com- complex-valued function 00 pkzk = Ez^ A.1.3) k=0
6 The generalized scheme of allocation and the components of random graphs is called the generating function of the distribution of the random variable ?. It is denned at least for \z\ < 1. For example, for the Poisson distribution with parameter A., which is defined by the probabilities Xk Pk = —e-k, k = 0,\,..., the generating function is ex^z~[\ Relation A.1.3) determines a one-to-one correspondence between the generat- generating functions and the distributions of nonnegative integer-valued random variables, since the distribution can be reconstructed by using the formula / @), k = 0, 1,.... A.1.4) k\ Generating functions are especially convenient for the investigation of sums of independent random variables. If ?i, ...,?„ are independent nonnegative integer- valued random variables and Sn = ?i + • • • + ?„, then <t>sn(z) = 01,00 ••-06,Or). The correspondence between the generating functions and the distributions is con- continuous in the following sense. Theorem 1.1.5. Let {p[ }, n = 1, 2, ..., be a sequence of distributions. If for anyk = 0,1, ..., (n) Pk "> Pk as n —> oo, then the sequence of corresponding generating functions <pn(z), n = 1,2,..., converges to the generating function of the sequence {pk} uniformly in any circle \z\ < r < 1. In particular, if{pk} is a distribution, then the sequence of corresponding gen- generating functions converges to the generating function (p(z) of the distribution {pk} uniformly in any circle \z\ < r < 1. Theorem 1.1.6. If the sequence of generating functions <pn(z), n = 1, 2,..., of the distributions {p? } converges to a generating function 0(z) of a distribution on a set M that has a limit point inside of the circle \z\ < 1, then the distributions {pk } converge weakly to the distribution Since a generating function <p(z) = YaLo Pkzk *s analytic, its coefficients can be represented by the Cauchy formula (p(z)dz 'c where the integral is over a contour C that lies inside the domain of analyticity of (p(z) and contains the point z = 0. Pn = l0(«)(O) = -L f n\ 2tti Jc
1.1 Probabilistic approach to enumerative combinatorial problems 7 Thus, if we are interested in the behavior of pn as n —> oo, then we have to be able to estimate contour integrals of the form 2tti Jc c where g(z) and f\z) are analytic in the neighborhood of the curve of integration C and A. is a real parameter tending to infinity. j The saddle-point method is used to estimate such integrals. The contour of i integration C may be chosen in different ways. The saddle-point method requires i choosing the contour C in such a way that it passes through the point z$, which is a root of the equation f\z) = 0. Such a point is called the saddle point, since the ¦ function 9t/(z) has a graph similar to a saddle or mountain pass. The saddle-point i method requires choosing the contour of integration such that it crosses the saddle point zo in the direction of the steepest descent. However, finding such a contour and applying it are complicated problems, so for the sake of simplicity one usually ! does not choose the best contour, hence losing some accuracy in the remainder term when estimating the integral. A parametric representation of the contour transforms the contour integral to i an integral with a real variable of integration. Therefore the following theorem i on estimating integrals with increasing parameters, based on Laplace's method, 1 sometimes provides an answer to the initial question on estimating integrals. Theorem 1.1.7. If the integral /•OO G(X)= / J — 00 converges absolutely for some X = Xo, that is, /•OO / \g(t)\ekof{t) dt < M; if the function fit) attains its maximum at a point to and in a neighborhood of this point fit) = fito) +a2it- tQJ + a3it- t0K + ¦¦¦ with #2 < 0; if for an arbitrary small 8 > 0, there exists h = hid) > 0 such that fito) ~ fit) > h, for\t-t0\ >8; and if as t —> to,
8 The generalized scheme of allocation and the components of random graphs where c is a nonzero constant and m is a nonnegative integer, then, as X —> oo, G(X) = ekm)X-m-{i2cc]m+[r(.m + where V(x) is the Euler gamma function and 1 1 c\ = In particular, if m =0, then c = g(to), and as X —> oo, G(X) = e^—MM=J^Jx(\ + OA/VI)). A.1.5) To demonstrate that this rather complicated theorem can really be used, let us estimate the integral •oo F(A.+ 1) = / xke~x dx = f Jo as X —> oo, and obtain the Stirling formula. The change of variables x = Xt leads to the equation •OO /• / Jo / o Hereg(O = 1, and f(t) = -(t - 1 - logO, /(D = 0, /(I) = 0, /'(I) = -1. The conditions of the theorem are fulfilled; therefore, by A.1.5), /•OO G(X) = / ekf{t) dt = y/2n/k(l + O(l/Vx)), Jo and for the Euler gamma function, we obtain the representation F(X + 1) = Xk+l/2e^V2^(l + 0A~ as X —> oo, coinciding with the Stirling formula, except for the remainder term, which can be improved to O(\/X). Generating functions are only suited for nonnegative integer-valued random variables. A more universal method of proving theorems on the convergence of sequences of random variables is provided by characteristic functions. The char- characteristic function of a random variable ? or the characteristic function of its distribution is defined as /•OO <p(t)=(p^t) = Eeut = / eitxdF(x), A.1.6) ./-oo where —oo < t < oo and F(x) is the distribution function of ?. If the rth moment mr exists, then the characteristic function <p{t) is r times differentiable, and <p{r)@) = irmr.
1.1 Probabilistic approach to enumerative combinatorial problems 9 Characteristic functions are convenient for investigating sums of independent random variables, since if Sn = ?1 + • • • + ?„, where ?1,..., %n are independent random variables, then ^@=^,@ •••<%,@- The characteristic function of the normal distribution with parameters (m, a2) and density Relation A.1.6) defines a one-to-one correspondence between characteristic functions and distributions. There are different inversion formulas that provide a formal possibility of reconstructing a distribution from its characteristic function, but they have limited practical applications. We state the simplest version of the inversion formulas. Theorem 1.1.8. If a characteristic function <p(t) is absolutely integrable, then the corresponding distribution has the bounded density -itx The correspondence defined by A.1.6) is continuous in the following sense. Theorem 1.1.9. A sequence of distributions converges weakly to a limit distri- distribution if and only if the corresponding sequence of characteristic functions <pn(t) converges to a continuous function <p(t) as n —> oo at every fixed t, —oo < t < oo. In this case, <p(t) is the characteristic function of the limit distribution, and the convergence <pn(t) —> <p(t) is uniform in any finite interval. For a sequence ?„ of characteristics of random combinatorial objects, applying Theorem 1.1.9 gives the limit distribution function. But for integer-valued char- characteristics, one would rather have an indication of the local behavior, that is, the behavior of the probabilities of individual values. To this end the so-called local limit theorems of probability theory are used. Let ? be an integer-valued random variable and pn = P{? = n}. It is clear that P{? e H) = 1, where H is the lattice of all integers. If there exists a lattice Fd with a span d such that P{? e Tj} = 1 and there is no lattice F with span greater than d such that P{? eF), then d is called the maximal span of the distribution of ?. The characteristic function <p{t) of the random variable ? is periodic with period 2n/d and \<p{t)\ < 1 for 0 < t < 2n/d.
10 The generalized scheme of allocation and the components of random graphs For integer-valued random variables, the inversion formula has the following form: 1 Cn In J-n e~ltnip{t)dt. Consider the sum 5V = ?| +••• + ?# of independent identically distributed integer-valued random variables ?i,..., ?#• When the distributions of the sum- mands are identical and do not depend on N, the problem of esti mating the probabi 1- ities P{Sn = n], as TV -> oo, has been completely solved. If there exist sequences of centering and normalizing numbers Am and Bm such that the distributions of the random variables (Sm — Am)/Bm converge weakly to some distribution, then the limit distribution has a density. Moreover, a local limit theorem holds on the lattice with a span equal to the maximal span of the distribution of the random variable ?i. If the maximal span of the distribution of ?i is 1, then the local theorem holds on the lattice of integers. Theorem 1.1.10. Let ?i, ?2. ¦ • • be a sequence of independent identically dis- distributed integer-valued random variables and let there exist Am and Bm such that, as N —> 00 for any fixed x, Then, if the maximal span of the distribution of%\ is 1, BNP{SN =n}- p((n - AN)/BN) -+ 0 uniformly in n. Local limit theorems are of primary importance in what follows. Therefore, let us prove a local theorem on convergence to the normal distribution as a model for proofs of local limit theorems in more complex cases, which will be discussed later in the book. Theorem 1.1.11. Let the independent identically distributed integer-valued ran- random variables ?1, ?2, • • • have a mathematical expectation a and a positive vari- variance a2. Then, if the maximal span of the distribution of%\ is 1, V27T uniformly in n as N —> 00. Proof. Let n — aN z = — and PN(n) = P{?i + ... + ^ = „}.
il Jl 1.1 Probabilistic approach to enumerative combinatorial problems 11 If <p{t) is the characteristic function of the random variable ?i, then the character- characteristic function of the sum ?#= ?1 + •••+?# is equal to (pNit), and 00 (pNit)= By the inversion formula, n=—oo n -itn.^N «) = — / e~nn(pNit)dt. A.1.7) Let (p* it) denote the characteristic function of the centered random variable ?1 - a, which equals (pit) exp{—ita}. Sincerc = aN + az^/N, it follows from A.1.7) that After the substitution x = to*/N, this equality takes the form eixz(<p*(x/(aVN)))Ndx. A.1.8) By the inversion formula, = — f 2x J- 4=e = f e/2dx. A.1.9) -Jin 2x J-oo It follows from A.1.8) and A.1.9) that the difference ~z2/2\ A.1.10) can be written as the sum of the following four integrals: h = jAe-ix h = ~ f e~ixz-x2/2dx, JA<\x\ h= f e-ixz((p*(x/(oVN)))Ndx, JA<\x\<eo*/N h= f e-ixz((p*(x/(oVN)))Ndx, Jea*/N<\x\<nG*/N ea*/N<\x\<nG*/N where the constants A and e will be chosen later. To see that Rm -> 0 as TV -> oo, we take an arbitrary 8 > 0 and show that can be made less than 8 for sufficiently large N.
12 The generalized scheme of allocation and the components of random graphs For h, we have |/21 < f e~x2/2dx, JA<\x\ and I/2I can be made arbitrarily small by the choice of sufficiently large A. Since E?i = a and D?i = a2, for the characteristic function <p*(t) as t ->• 0, we have ^B) A.1.11) Let (fNit) denote the characteristic function of EV — aN)/(a*/N), which equals {cp*{x/(a^/N)))N. For any fixed x and N ->• 00, we obtain from A.1.11) the relation \og(pN(x) = N\og<p*(x/(asfN)) 2 + implying that for any fixed x as N ->• oo, Wv(*) -> e~x2'2. A.1.12) Moreover, as seen from A.1.11), there exists s > 0 such that, for |r| < s, < \-°J- <e-a2t2!\ A.1.13) Using this inequality to estimate I?,, we find that h<( \(p*{x/{aVN))\Ndx< f e~x2'Adx, J A<\x\<eg-Jn J A<\x\<sa-/N and by the choice of sufficiently large A, \h\ can be made arbitrarily small. Let s be such that A.1.13) is satisfied and let A be large enough so that \h\ < 8/4 and I/31 < 8/4. Let us now estimate the integrals I\ and I4 for fixed s and A. Rela- Relation A.1.12) implies that the distribution of (S^—aN)/(a */N) converges weakly, as N -+ 00, to the normal distribution with parameters @, 1). The convergence of the characteristic functions <Pn(x) to the characteristic function of the normal law is uniform in any finite interval, and the integral I\ tends to zero as N —>¦ 00. For I4, we have f \N = f \(p*(x/(aVN))\Ndx = aVN f f \<p(t)\Ndt. s<\t\<n Since the maximal span of the distribution of ?1 is 1, max \(p(t)\ = q < 1.
1.1 Probabilistic approach to enumerative combinatorial problems 13 Hence, and [4 -+ 0 as N ->• 00. The estimates of I\ and I4 show that there exists /Vo such that 11\ \ < S/4 and <S/4forN> Mo. Thus the difference Rn tends to zero as N —>¦ 00 uniformly for all integers n. In most applications of local theorems in this text, the distribution of the sum- summands of the sum Sn = ?1 + • • • + ?yv depends on the number of summands N. In such cases, there is no complete answer to the question of when the local theo- theorem holds for 5V • Even in the case of convergence to the normal law, the known sufficient conditions for the validity of a local theorem cannot be deemed fully satisfactory. Hence, for each specific distribution whose parameters depend on the number of summands in the sum, it is necessary to invoke the classical scheme given above as a model. In the hope of finding simple sufficient conditions for the validity of local theorems for integer-valued identically distributed summands, as in Theorems 1.1.10 and 1.1.11, we will often omit the particularly cumbersome calculations arising in estimating characteristic functions. If ?1,..., ?n are independent identically distributed random variables such that P{%l = l} = p and P{?i =0} = q = 1 -p for 0 < p < 1, then Sn = ?1 + • • • + ?jv has the binomial distribution with parameters (N, p), that is, for any k = 0, 1,..., N, If Npq ->• 00, then the binomial distribution is approximated by the normal law. The following theorem, known as the local de Moivre-Laplace theorem, can be obtained by a direct analysis of the explicit formula. Theorem 1.1.12. IfN -+ 00 and A + u6)/(Npq) -+ 0, where k-Np u = then Theorem 1.1.12 implies the well-known integral de Moivre-Laplace theorem.
14 The generalized scheme of allocation and the components of random graphs Theorem 1.1.13. If'N -+ oo and A + ub)/{Npq) -> 0, where k-Np u = <Npq then P{SN <?} = -L f e-x2/2dx(\+o(\)). v27T J—oo If p —> 0, then the binomial distribution is approximated by the Poisson law. It is well known that if N ->¦ oo and Np ->• A, 0 < A. < oo, then for any fixed A: = 0, 1, The Poisson approximation is also valid if Np tends to infinity not too quickly. Theorem 1.1.14. If N —>• oo, Np —>• oo, A + w2)/? -> 0, where k-Np u = — , The Poisson distribution converges to the normal law as its parameter tends to infinity. Theorem 1.1.15. If (I + u6)/X -+ 0, where u = (k - A)/VX, then Sometimes it is necessary to estimate the tails of the binomial distribution in the form of an inequality with an explicit constant. Theorem 1.1.16. For any x > 0, P{SN-ESN >Nx} <e~2Nx\ 1.2. The generalized scheme of allocation In the past three decades, the so-called generalized scheme of allocation of particles has been applied to many probabilistic problems of combinatorics, and many of the results in this text were obtained by reducing combinatorial problems to such a generalized scheme.
1.2 The generalized scheme of allocation 15 Consider n independent trials, each having N equiprobable outcomes, 1,2, ..., N. Let r\i denote the number of occurrences of the /th outcome in this sequence of trials, i = 1, 2, ..., N. The random variables 771, • • •. *7/V have the multinomial distribution: Ifthenonnegative integers k 1, ...,?# are such that A: 1 H \-k^ = n, then P{m =kl,...,VN=kN}= n] A.2.1) k\! • ¦ • kx! Nn The situation in which the multinomial distribution arises can be described in terms of an equiprobable scheme of allocating particles to cells. If n particles are independently distributed with equal probabilities into N cells labeled 1,2,. ..,N, then the contents of cells 771, ..., tjn have the multinomial distribution A.2.1). In the scheme of allocating particles to cells yielding the multinomial distri- distribution, the contents of cells can be obtained by independent sequential allocation of particles. If one does not require that the contents of cells can be obtained by some sequential allocation of particles, with a simple probability law governing the sequential trials, then any set of integer-valued nonnegative random variables 771,..., 77yv, such that 771 H h r]^ = n, can be viewed as a scheme of allocating n particles to N cells, and one can interpret 77; as the number of particles in the cell with index i,i = 1,2,..., N. Some probabilistic problems of combinatorics can be treated by using general- generalized schemes of allocation in which the joint distribution of the contents of cells 771,,.., 77 yv can be represented in the form =k\,...,r]N = kN} = P{?i = k\, ..., %N = kN \ ?1 H h?yv = »}, A.2.2) where ?1 ,...,?# are independent identically distributed integer-valued random variables. The generalized scheme of allocating particles to cells is given by the parameters n and N and the distribution of the random variables ?1,..., ?yv, which by relation A.2.2) determines the joint distribution of the contents of the cells 771, ..., r]^. Set Pk = P{§i=k], k = 0,l,.... A.2.3) For the random variables 771,..., 77^ with the multinomial distribution A.2.1), relation A.2.2) is satisfied if ?1 has the Poisson distribution with arbitrary param- parameter A: A.2.4) Therefore the distribution of 771,..., 77^ satisfying relation A.2.2) for some distribution A.2.3) can be viewed as a generalization of the multinomial distribu- distribution.
16 The generalized scheme of allocation and the components of random graphs The term "classical scheme of allocation" has become common for the equi- probable scheme of allocating particles to cells leading to the multinomial distri- distribution A.2.1). The terminology of the classical scheme of allocating particles to cells proved to be convenient for describing a number of combinatorial problems where the multinomial distribution appears. Many results pertaining to the classi- classical scheme of allocation can be obtained by applying relation A.2.2) between the multinomial distribution and the Poisson distribution A.2.4). Introducing gener- generalized schemes of allocating particles not only broadens the scope of convenient language for describing combinatorial objects, but also offers the possibility of applying methods based on relation A.2.2) that have been developed to analyze the classical scheme. Let \±r (n, N) denote the number of cells containing exactly r particles in the generalized scheme of allocation with distributions A.2.2) and A.2.3). We show that the representation A.2.2) can be used to study this random variable. Let ?j , ...,?Jy be independent identically distributed random variables whose distribution is linked with the distribution of ?i ,...,?# as follows: p{?i(r) = *} = p{?i = * I *i * '•}. * = o, 1,.... Also let \ — 51 H +57V, JN — 4-j H +5yy- The following lemma expresses the distribution of /xr (n, N) in terms of the prob- probabilities of sums of independent identically distributed random variables. Lemma 1.2.1. «. A0 = *) = N = n} Proof. Let Ap be the event that exactly k of the random variables take the value r. By equality A.2.2), The lemma is derived by obvious manipulations of the numerator: The events A^ can occur for (k) distinct choices of random variables taking the value r; therefore P{SN =n\^^r,..., $N_k ^ r, HN-k+\ =
1.2 The generalized scheme of allocation 17 In the generalized scheme of allocating particles, there is a rather simple ap- approach to study the order statistics r\{\) < rjB) < • • • < V{N) constructed for the random variables r\\,..., t]n arranged in nondecreasing order. Let?,(/1),..., ^ be independent identically distributed random variables such that P$A) =k} = Pfa =k\htA}, k = 0, 1,..., where A is a subset of the set of natural numbers with P{?i g A} > 0. In particular, if A consists of one value r, then ?[ = ?, , where ?j is the random variable defined preceding Lemma 1.2.1. Set The following lemma reduces the study of distributions of order statistics to that of probabilities related to sums of independent random variables. Lemma 1.2.2. For any positive integer m, , aUr) _ 1 P{rKN-m+i)<r}= 2_^ [j )Pr(\-Pr) _ , A-2.7) where Ar is the set of all nonnegative integers not exceeding r,Ar is its complement in the set of all nonnegative integers, and Pr = P{?i > r). Proof. Let us prove A.2.7) for m = 1. For the maximal order statistic max(r7i,..., 77yv), by A.2.2) and the independence of ?1,..., ?yv, we have P{r](N) < r} = P{r]i < r, ..., r]N < r} <r,...,%N <r\SN=n] < r})NP{SN =n\§l<r By using the random variables ?| ,...,%N , we finally obtain (l-Pr)NP{slAr) =n 1 = n\ Relations A.2.6) and A.2.7) for other values of m are similarly proved. A-2.8) For the joint distribution of the random variables /xr| (n, N),..., firs(n, N), we can prove the following lemma as we did in Lemma 1.2.1.
18 The generalized scheme of allocation and the components of random graphs Lemma 1.2.3. A/I „*'... Dk*(\ - »*' *, Ari! - - • Arv! (iV — Ari ks)\ X where s — \,k\,... ,ks,r\,... ,rs are nonnegative integers and r\,..., rs are distinct. Lemmas 1.2.1, 1.2.2, and 1.2.3 express the distributions of the random variables lir(n, N) and the order statistics r}(\), 77B), • • •, V(N) in the generalized scheme of allocating particles in terms of probabilities related to sums of independent random variables. Obtaining limit distributions for the random variables /xr(n, N) and 77A), 77B), •. •, 77(yv) is reduced to applying local limit theorems for sums of independent identically distributed integer-valued random variables. We now give some examples of how combinatorial problems can be reduced to the generalized scheme of allocating particles to cells. Example 1.2.1. Consider single-valued mappings of the set Xn = {1, 2,...,«} into itself. A single-valued mapping s of the set Xn into itself can be represented as s = where sk denotes the image of k, k = 1,2,... ,n, under the mapping s. The mapping s may be thought of as an oriented graph T^ = T(Xn, Wn) with vertex set Xn and arcs Wn = {(k,sk),k= 1, 2, ...,«}, where the arc (k, Sk) is directed from kto Sk,k = 1,2,... ,n. The number of arcs entering the vertex k in the graph Tn , which is the number of pre-images of the element k under the mapping s, is called the multiplicity of the vertex k. Let En denote the set of all single-valued mappings of Xn into itself, and Fn the set of all graphs of these mappings. The number of elements of ?„ is obviously equal to n". If the uniform distribution is defined on the set EM, then we obtain a probability space whose set of elementary events Q is the set EM; and the probability for any subset of EM is the number of elements in the subset divided by n". The random mapping a is any of the nn possible mappings with probability P{a =s} = n~n,s € E«.If a =
1.2 The generalized scheme of allocation 19 where the random variable 07 is the random image of the element i, i = 1, 2,..., n, then, for any s, P{a = s} = P{a{ = si, ...,<rn =sn} = n~n. Thus the random variables o\, ..., an are independent and take the values 1, 2, ..., n with equal probabilities. Let r]r denote the multiplicity of the vertex r in the random mapping a,r = 1,2,... ,n. The quantity r\r is equal to the number of random variables o\,..., an taking the value r; thus, for nonnegative integers k\,..., kn with k\ + - • -+kn = n, the probability P{rj\ = k\,..., r]n = kn) is equal to the sum of probabilities P{cri = si, ..., an = sn} = n~n, where among s\,... ,sn there are exactly kr values equal to r, r = 1,2,... ,n. The number of summands in this sum is obviously n \/{k\ !•••?„!); therefore P{t]i = k\,..., x]n = kn} = k\\---kn\nn Thus the joint distribution of the multiplicities of the vertices 771,..., r]n of a random mapping is the multinomial distribution. Taking the vertices as cells and the arcs going into these vertices as particles, we obtain the classical scheme of allocating n particles to n cells with multinomial distribution of the contents of the cells r\\,... ,r\n. For the random variables r]\, ..., r]n, relation A.2.2) holds: P{r)\ =k\,...,r)n=kn\ = P{?i = k\, ...,?„ = kn | ?1 H h ?«=«}, in which ?1 ,...,?„ are independent and identically Poisson-distributed. The number of vertices /xr(n) in a random mapping with multiplicity r corre- corresponds to the number of cells containing exactly r particles in the classical scheme of allocating n particles to n cells; to study these variables, as well as the order statistics made up of the multiplicities of the vertices, one can invoke Lemmas 1.2.1, 1.2.2, and 1.2.3. Example 1.2.2. Consider all distinct partitions of n into N summands not less than r > 0. The number of such partitions is (""^l^). Let us define the uniform distribution on the set of these partitions by assigning the probability /n_CA-_1\/V_1\ —1 1 tV_i ) t0 each partition n = n 1 + • • • + n^, n 1,..., n^ > r. Then n can be written in the form n = 771 H \-rjN, where the summands 771,..., r]^ are random variables. If n\,..., n = n\ + ¦ ¦ ¦ +hn, then
20 The generalized scheme of allocation and the components of random graphs The general scheme of allocation corresponding to this combinatorial problem is obtained if we use the geometric distribution for the distribution of the random variables ?1, .. • ,%N'- P{?, =k} = pk-'\ 1 - p), k = r,r+\,..., 0<p<\. Indeed, as is easily verified, fn-{r - \)N - \\~{ P{?i =«],..., %N =nN | $! H +%N=n}=l N_^ I , since, for geometrically distributed summands, Example 1.2.3. Note that it is not necessary for the random variables ?t ...,?# in a generalized scheme to be identically distributed. Consider the following example. Draw n balls at random without replacement from an urn containing m; balls of the zth color, i = 1,..., N. Let r\i denote the number of balls drawn of the zth color, i = 1,..., N. It is easily seen that for nonnegative integers n i,..., n^ such that n\ + ¦ ¦ ¦ + ax = n, =n\,...,r]N =nN} = fm\\ /mN Knx) \nNi 0 where m = m\ + ¦ • • + If in the generalized scheme of allocation the random variables %i,... ,%n nave the binomial distributions where 0 < p < l,k = 1,2, ... ,mt,i = 1,..., N, then =nN | ?i H \-?-N = n} = mN nN/ 0 and the distribution of the random variables rji,... ,t]n coincides with the con- conditional distribution of the independent random variables t-i, ...,!• n under the condition ?i H \-%N = n. Thus rj\,... ,t]n may be viewed as contents of cells in the generalized scheme of allocation, in which the random variables ?i ,...,?# have different binomial distributions.
1.2 The generalized scheme of allocation 21 Example 1.2.4. In a sense, the graph Yn of a random mapping consists of trees. In- Indeed, the graph can be naturally decomposed into connected components. Clearly, each connected component of the graph Tn contains exactly one cycle. Vertices in the cycle are called cyclic. If we remove the arcs joining the cyclic vertices, then the graph turns into a forest, that is, a graph consisting of rooted trees. Recall that a rooted tree with n + 1 vertices is a connected undirected graph without cycles, with one special vertex called the root, and with n nonroot labeled vertices. A rooted tree with n + 1 vertices has n edges. In what follows, we view all edges of trees as directed away from the root, and the multiplicity of a vertex of a tree is defined as the number of edges emanating from it. Let Tn denote the set of all rooted trees with n + 1 vertices whose roots are labeled zero, and the n nonroot vertices are labeled 1, 2,..., n. The number of elements of the set Tn is equal to (n + I)". A forest with N roots and n nonroot vertices is a graph, all of whose components are trees. The roots of these trees are labeled with I,..., N and the nonroot vertices with 1,..., n. We denote the set of all such forests by Tn > n • The number of elements in the set Tn^ is N(n + N)"~l. The number of forests in which the Arth tree contains n/c nonroot vertices, k = 1, 2,..., n, is nil- ••nN\ where the factor n!/(«i !•••«#!) is the number of partitions of n vertices into N ordered groups, and («# + I)"* is the number of trees that can be constructed from the Arth group of vertices of each partition. Then «H VnN-n ly where the summation is taken over nonnegative integers n\,..., n^ such that n\-\ \-riN = n. Next, we define the uniform distribution on Tn^. Let r]k denote the number of nonroot vertices in the Arth tree of a random forest in Tn^, k = 1,..., N. For the random variables r)\,..., tjn, we have P{r]i =m, ...,r]N =nN} = —— , —— ——, A.2.10) N(n + N^-Hni + 1)! ¦¦¦(nN + 1)! where n\,..., n^ are nonnegative integers and n\ + ¦ ¦ ¦ +n^ = n. Let us consider independent identically distributed random variables ?i, ..., for which |^ * = 0,l,..., A.2.11) where the parameter x lies in the interval 0 < x < e~l and the function 9{x) is
22 The generalized scheme of allocation and the components of random graphs defined as -xk. k=l By using A.2.9), we easily obtain C- hence, for any x, 0 < x < e~l, and for nonnegative integers n\,..., n^ such that n\ + ¦ ¦ ¦ + flyv = n, A.2.12) 1)! •••(«# +1)! The right-hand sides of A.2.10) and A.2.12) are identical, and the joint distribution of r)\,..., r]^ coincides with the distribution of ?i, ...,?# under the condition that ?i + • ¦ • +?n = n. Thus, for the random variables r]\,..., r)N and ?i ,...,?#, relation A.2.2) holds, enabling us to study tree sizes in a random forest by using the generalized scheme of allocating particles into cells, with the random variables ?i,..., ?n having the distribution given by A.2.11). 1.3. Connectivity of graphs and the generalized scheme Not pretending to give an exhaustive solution, let us describe a rather general model of a random graph by using the generalized scheme of allocation. Consider the set of all graphs Tn(R) with n labeled vertices possessing a property R. We assume that connectivity is defined for the graphs from this set and that each graph is represented as a union of its connected components. In the formal treatment that follows, it may be helpful to keep in mind the graphs of random mappings or of random permutations. The former graphs consist of components that are connected directed graphs with exactly one cycle, whereas the latter graphs consist only of cycles. Let an denote the number of graphs in the set rn(R) and let bn be the num- number of connected graphs in Tn{R). We denote by Tn%N(R) the subset of graphs in Tn (R) with exactly N connected components. Note that the components of a graph in rnjv(/?) are unordered, and hence we can consider only the symmetric characteristics that do not depend on the order of the components. To avoid this restriction, we, instead, consider the set fn^(R) of combinatorial objects con- constructed by means of all possible orderings of the components of each graph from
1.3 Connectivity of graphs and the generalized scheme 23 r,,,yv(/?)- The elements of this set are ordered collections of N components, each of which is a connected graph possessing the property R, and the total number of vertices in the components is equal to n. Since the vertices of a graph in r,,jv(/?) are labeled, all the connected components of the graph are distinct; therefore the number of elements in tn^(R) is equal to N\an^, where an%^ is the number of elements of the set Fn^(R) consisting of the unordered collection of components. Now let us impose a restriction on the property R of graphs. Let a graph possess the property R if and only if the property holds for each connected component: The property R is then called decomposable. Set ao = 1, bo = 0 and introduce the generating functions °° n rn °° h xn n=\ n=0 Lemma 1.3.1. If the property R is decomposable, then where the summation is taken over nonnegative integers n\,..., n^ such that n\-\ Proof. Withni H \-n^ = n and^i,... ,n^ > 1, \etan(ni,... ,n^) denote the number of graphs in fnjv(/?) with ordered components of sizes n\,..., n^. We construct all an (n i,..., n^) such graphs and decompose the n labeled vertices into N groups so that there are «/ vertices in the z'th group, i = 1,..., N. This can be done in n \/(n i! ¦ ¦ ¦ n^ 0 ways. From ni vertices, we construct a connected graph possessing the property R; this can be done in bni ways. Thus the number of ordered sets of connected components of sizes n \,..., n^ is - t x n\bnx---bnN an(n\, ...,nN) = ¦ —. n\\---nN\ Since N components can be ordered in JV! ways, the number an(n\, ..., n^) of unordered sets, or the number of graphs in rn^(R) having exactly N components of sizes «i,..., ftyv, is ) = —an{ri\, ..,nN) = N\ni\ / \ ^ - i n\bni ¦ ¦¦bflN an(nu .. .,nN) = —an{ri\, ...,nN) = — ¦ -. A.3.2) iV^! N\ni\-- -n^l Lemma 1.3.2. If the property R is decomposable, then
24 The generalized scheme of allocation and the components of random graphs Proof. As follows from A.3.1), the number an of all graphs in Fn (R) is fe"'"--V A.3.3) !1 By dividing both sides of this equality by n!, multiplying by x", and summing over n, we get the chain of equalities oo M = l OO 00 n * h r«l... E m\---nN\ N=l = eB^ - 1, which proves the lemma. ¦ Let us define the uniform distribution on the set Yn (R) and consider the random variables am equal to the number of components of size m in a random graph from Tn(R). The total number of components vn of a random graph from Tn(R) is related to these variables by vn = ai + ¦¦¦ +<xn. Arrange the components in order of nondecreasing sizes and denote by ftm the size of the wth components in the ordered series; if m > vn, set fim = 0. We will also consider the random variables defined on the set fn>yv (R) of ordered sets of N components. The ordered components labeled with the numbers from 1 to N play the role of cells in the generalized scheme of allocating particles. Define the uniform distribution on Tn^(R) and denote by r/i, ... ,t]n the sizes of the ordered connected components of a random element in rn^(R). It is then clear that p Nlan(ni,...,nN) an{n\, ..., nN) r{rii = n\, ..., tin = n^} = = . A.3.4) N\an,N an%N Theorem 1.3.1. If the series J^ h xn V d-3.5) to has a nonzero radius of convergence, then the random variables r}\,... ,t]N are the contents of cells in the generalized scheme of allocation in which the independent
! .3 Connectivity of graphs and the generalized scheme 25 identically distributed random variables ?| ,...,?# have the distribution • (IX6) where the positive value xfrom the domain of convergence of( 1.3.5) may be taken arbitrarily. Proof. Let us find the conditional joint distribution of the random variables ?i,..., ?,v with distribution A.3.6) under the condition i-i + ¦ ¦ • + i-N = n. For such random variables, Z_, _ ^T7 and by virtue of A.3.1), xnN\ —TF-anN- A.3.8) Hence, if n\,... ,n^ > 1 and^i H +«yv = n, then =nN | ?i H ni\---nNl(B(x))NP{t;i+---+i;N=n} bnx ¦ ¦ -bnNn\ nil---nNlN\an,N' and according to A.3.2), A.3.9) From A.3.4) and A.3.9), we obtain the relation A.2.2) between the random vari- variables rj\,... ,tjn and ?i ,...,?# in tne generalized scheme of allocating particles to cells. ¦ In the generalized scheme of allocating particles, we usually study the random variables ixr{n, N) equal to the number of cells containing exactly r particles and the order statistics r](i), r)B), .. •, t](n) obtained by arranging the contents of cells in nondecreasing order. In this case, ixr(n, N) is the number of components of size r, and 77A), rjB), ¦ ¦ •, r}(N) ^e tne sizes of the components in a random element from Fn^(R) arranged in nondecreasing order. The random variables help in studying distributions of the random variables a 1 ,...,«„ and the associated variables defined on the set Fn (R) of all graphs possessing the property R.
26 The generalized scheme of allocation and the components of random graphs Lemma 1.3.3. For any positive x from the domain of convergence of( 1.3.5), «' (B(x))N P{Vn = N}= ') K' " P{fr + • • • + &v = n}. A.3.10) Proof. Relation A.3.10) follows from A.3.8) because P{vn = N} = an^/an by definition. ¦ It is clear by virtue of A.3.3) that the number an can also be expressed in terms of probabilities related to ?i ,...,?#: Nlxn N=[ Lemma 1.3.4. For any nonnegative integers N, m i,..., mn, P{«1 =mi,...,an =mn \vn = N} = P{/j.i(n,N) = mi, ...,/jLn(n, N) =mn). Proof. The conditional distribution on Vn{R) under the condition vn = N is concentrated on the set Fn>yv (/?) of graphs having exactly N connected components and is uniform on this set. Hence, or i An cN(mi,...,mn) P{«1 = mi, ...,«„= mn | vn = N} = , A.3.12) an,N where anjj is the number of elements in Fn>yv(/?) and Cyv(wi,..., mn) is the number of graphs in Tn^ (R) such that the number of components of size r is mr, r = 1,2,...,». Consider the above set F« ,n(R) composed of ordered sets of N components. Let Cyv(wi,..., mn) denote the number of elements in rn^(R) such that the number of components of size r is m r, r = 1, 2,..., n. It is clear that Df I K\ < AM 1 ^(Wl, . .. , mn) P{/jLi(n,N) - mi, ...,/jLn(n,N) = mn) = = , A.3.13) an,N where an^ is the number of elements in Yn,N (R)- The assertion of the lemma fol- follows from A.3.12) and A.3.13) because an^ = N\ an^ and c^(mi,..., mn) = N\cN(mi,...,mn). M Thus, if the series A.3.5) has a nonzero radius of convergence, then all of the random variables expressed by a i ,...,«„ can be studied by using the generalized scheme of allocating particles in which the random variables ?i ,...,?# nave the distribution A.3.6). Roughly speaking, under the condition that the number vn of connected components of the graph Fn (R) is N, the sizes of these components (under a random ordering) have the same joint distribution as the random variables
1.3 Connectivity of graphs and the generalized scheme 27 r)i, ...,t]n in the generalized scheme of allocating particles that are defined by the independent random variables ?1, ...,?# with distribution A.3.6). Thus, for vn = N the random variables fii,..., f}N are expressed in terms of ct\ ,...,«„ in exactly the same way as the order statistics r^(i),..., rj^) in the generalized scheme of allocating particles are expressed in terms of /x i (n, N),..., \xn (n, N). Hence, Lemma 1.3.4 implies the following assertion. Lemma 1.3.5. For any nonnegative integers N, k\,..., k^, P{/3) = ki, . . . , /3yV = ?/V I vn = N} = P{^A) = kl, . . . , ?7(yV) = ^yv}. A.3.14) We now consider the joint distribution of ^i(n, N), ..., /xn(n, N). Lemma 1.3.6. For nonnegative integers mi,..., mn such thatmi + - ¦ -+mn = N and m i + 2w2 + • • ¦ + nmn = n, n, N) =mi,..., ixn(n,N) =mn) A.3.15) Proof. To obtain A.3.15), it suffices to calculate c^(mi, ..., mn) in A.3.13). It is clear that where the summation is taken over all sets («i,..., nj^) containing the element r exactly mr times, r = 1,..., n. The number of such sets is N\/(m i! • • • mn!), and for each of them, by A.3.2), n\y?x---bnn an{m, ...,nN) = (I!)' ¦¦¦(n\)m« Hence, Nlnlb™1 ¦¦¦bnn cm (mi,..., mn) = — mil ¦ ¦ -mnl (l\)m] ¦ ¦ ¦ (n\)mn To obtain formula A.3.15), it remains to note that P{vn =N} = ^= " an N\an Lemmas 1.3.4 and 1.3.6 enable us to express the joint distribution of the random variables a\,... ,an in a random graph from Fn (R).
28 The generalized scheme of allocation and the components of random graphs Lemma 1.3.7. Ifm [,..., mn are nonnegative integers, then Pfai = m i, ..., <xn = mn} = an l \ mr\(r\)m'- 0 otherwise. Proof. By the total probability formula, P{«1 = m\,..., an =mn] N ^~ / ¦ t ^n ^~ *?} * \0? [ ^= 7YI |, . . . , (XY\ ^= ftlY\ | Vft ^ K| k=i = P{vn = N}P{ai =m\,...,an=mn \ vn = N), where N = mi + ¦ • • +mn.By using Lemma 1.3.4, we find that P{«1 = mi, ...,an = mn) = P[vn= N}P{fjLi(n,N)=mi,...,fjLn(n,N)=mn}. A.3.16) It remains to note that P{/xi(n, N) = mi, ..., /i.«(n, N) = mn] = 0 if m\ + 2w2 + ¦ ¦ • + nmn 7^ n and that equality A.3.15) from Lemma 1.3.6 holds for the probability P{/xi(n, N) = mi,..., fin(n,N) = mn} if mi + ¦¦¦ +mn = N and mi + 2w2 + • • • + nmn = n. The substitution of A.3.15) into A.3.16) proves Lemma 1.3.7. ¦ We now turn to some examples. Example 1.3.1. The set Sn of one-to-one mappings corresponds to the set Fn (R) of graphs with n vertices for which we have the property R: Graphs are directed with exactly one arc entering each vertex and exactly one arc emanating from each vertex. This property is decomposable. The connected components of such a graph are (directed) cycles. In this case, an = n\,bn = (n — 1)!, and the generating functions A(x) = j—, B(x) = -\og(l-x) satisfy the relations of Lemma 1.3.2: A(x) = eB(x). A.3.17) To study the lengths of cycles of a random permutation and the associated variables, one can use the generalized scheme of allocating particles in which the random variables ?i ,...,?# have the distribution xk =k} = -—— -, ?=1,2,..., 0<x<\. Hog(l -x)
1.3 Connectivity of graphs and the generalized scheme 29 Example 1.3.2. The set E,, of all single-valued mappings corresponds to the set Fn(R) of graphs with n vertices with property R: The graphs are directed with exactly one arc emanating from each vertex. This property is decomposable. Since the number of elements of EM is nn, from relation A.3.17) for the generating functions we find that °° nnxn s(x) = log/*(*) = log y]—-, yielding n— 1 fc k=Q ' The radius of convergence of A (x) and B(x) is e~', and at the point x = e~l, they diverge. To study the characteristics of a random mapping, we can use the generalized scheme of allocating particles in which the random variables ?i ,...,?# have the distribution *> *12 0<<«-1 *>771S7T' *1>2>---> 0<x<«. k\ B(x) Example 1.3.3. Consider the set of all unordered partitions of the set Xn = {1,2, ...,«} into disjoint subsets, the union of which is Xn. The partition of Xn into unordered subsets Y\,..., Y^ corresponds to the hypergraph of Fn ^ (R) with n vertices and N hyperedges Y\, ..., Y^. Since all of the JV! orderings of the hy- hyperedges Y\, ..., 7yv are distinct, each hypergraph of Tn^(R) gives us JV! distinct objects of Fn>yv(/?) that are hypergraphs with n vertices and N ordered hyperedges Ai,..., An, with the sets of hyperedges being permutations of Y\,..., Yn- The property R determining this class of graphs requires that a graph be a hypergraph whose distinct hyperedges have no common vertices. Each connected component of such a graph is a hyperedge. Clearly, the number of connected graphs possessing the property R with n vertices is 1, that is, bn = 1, so oo „ Since R is decomposable, This equality, or A.3.3), yields n _ y^ yV=l " ' ' n\-\ \-nx=n where the second summation is over positive integers n \, ..., n i n\\---nN\
30 The generalized scheme of allocation and the components of random graphs Thus, to study random partitions, we can use the generalized scheme of al- allocation in which the random variables ?1 ?# have the truncated Poisson distribution * - 1)' ^1'2'-"-' °<*<°°- Example 1.3.4. A tree is a connected graph without cycles. As the set Fn let us consider the set J-n,N of all forests consisting of N trees with the total number n of labeled vertices. The trees in a forest are not ordered. The property R determining this class of graphs requires that a graph be undirected without cycles. The property R is decomposable. The number bn of connected graphs possessing the property R is the number of nonrooted trees with n vertices and bn = n"~2, so the generating function is 00 nn-2xn B(x) = y^ , 0<x<e~l. n — \ Thus, to study a random forest from Tn^, we can use the generalized scheme of allocation in which the random variables ?i ,...,?# have the distribution fck-2k *=1.2,..., 0<x<e~l. k\B(x) 1.4. Forests of nonrooted trees The graphs consisting of nonrooted trees and unicyclic components play the same role in investigating graphs as the forests of rooted trees do for graphs of mappings. Hence, the following sections concentrate on these objects, using the generalized scheme of allocation. As in Example 1.3.4, let Tn^ be the set of all forests of N nonrooted trees with n vertices. It is known that the number of forests of N ordered rooted trees with total number n of nonroot vertices is N(N + n)n~l. In contrast to the forests of rooted trees, there is no simple formula for the number Fn^ = \Fn,N\ of forests of nonrooted trees. Therefore the first step is to study the asymptotic behavior of FntN. Denote by T the number of edges in a forest belonging to Tn,N- It is easy to see that T = n — N. Following the general algorithm for applying the gen- generalized scheme of allocation, let us consider the set Tn%N, which consists of N ordered nonrooted trees, and define the uniform distribution on this set. Denote by rji, ..., rjN the sizes of ordered trees in a random graph from Tn^. By Cay- ley's formula for counting trees, the number bn of nonrooted trees with n vertices is n"~2. Denote by an{n\,...,n^) the number of elements in Tn^ for which [r\\ = n\,..., tin = n^}. It is easy to see that for positive integers n\,...,
1.4 Forests of nonrooted trees 31 i + h ft,v = n, n\ an(fi[, ... ,nN) = —- -bni ¦ ¦ -bnN, A-4.1) n\\---nN\ and the number of elements in Tn^ is B V^ - / x V^ n\bn] ¦ ¦ -bnN «H \-nfi=n n\-\ Thus, for the number of forests Fn<^, we obtain the formula ni—2 nn—1 r n T , ni2 n \-^ «r • • • n xT where the summation is over positive integers n \,..., n n such that «! + ••• + nN =n. Introduce independent identically distributed random variables %\, ¦ ¦. ,%n for which where °° h,rk °° kk-2rk g^ E4rL- °<*s«-1. d.4.4) In accordance with the results of the previous section and Example 1.3.4, the generalized scheme of allocation can be applied to investigating random forests of nonrooted trees, that is, relation A.2.2) is valid: For any integers n\,..., P{rn =n\,...,r\N =nN} = P{?i =»i,..., l-N =nN | ^i H \-%n = «}• For the number of forests Fn^, formula A.3.8) is valid, which, of course, can be obtained directly from A.4.2) and A.4.3): nUB(x)) N n n \x" where B(x) is defined by A.4.4), and the value of the parameter x in the distribu- distribution A.4.3) of the random variables t-i,... ,%n can be chosen arbitrarily from the domain of convergence of the series B(x). Thus, to obtain the asymptotics of iy yy, it is sufficient to choose an appropriate value of x, 0 < x < e~l, and analyze the asymptotic behavior of the probability P{?i + • • • + ?yv = «} for the sum of the random variables ?i,..., ?# mat have the distribution A.4.3) with the chosen value of the parameter x.
32 The generalized scheme of allocation and the components of random graphs The first two moments of the random variable ?| have the following expressions: B(x) ^ k\ kkrk Therefore, along with #(x), we consider two functions 00 i k k OO , ?_] i- 11 11 k=\ k—\ The function 9(x) is the solution of the equation 6e~e=x A.4.6) if we choose the solution that is less than 1. The functions a(x) and B(x) can be represented in terms of this function. Differentiating A.4.6) gives 9'(x)e-9(x) -0(xH'(x)e-e(x) = 1; hence, ^W = —3^—. A.4.7) On the other hand, Thus a(x) = 6{-X) . A.4.8) Slightly more complicated calculations are needed to obtain the relation B{x) = \{\~{\-9{x)J). A.4.9) Consider the function By using A.4.7), we obtain ~ kk~lxk~l h'{x) = -2A -9(x)N'(x) = -
1.4 Forests of nonrooted trees 33 When we integrate both sides of this equality, we obtain rx °o L.k-1 rx / h'{t)dt = h{x)-\ = -2TK-— Jo f-1 k\ Jo V k=\ which implies equality A.4.9). Relations A.4.8) and A.4.9) allow us to calculate the mean E?i and the variance D?i. For 0 < 6 < 1, we set x=Oe-0. For such a choice of the parameter x, 6 6B-6) 0(H () , B(x)= ' 1 — C7 Z therefore 5(jc) 2 m = 5(x) 2-6 2 r,t a{x) {6(x)\2 26 B(x) \B(x)J A- If the parameter 6 is fixed, then Theorem 1.1.11 may be applied to the sum In fact, the theorem on local convergence to the normal law is valid in a wider region. Theorem 1.4.1. If N -> oo and 6 = #(A0 varies such that 6N -> oo oo, //ien = k) = i2 uniformly in the integers k such that u = (k — Nm)/(a*/~N) lies in any fixed finite interval. Proof. First we prove that, under the conditions of the theorem, the distribution of (Zn — m N)/ (cr *J~N) converges weakly to the normal distribution with parameters @, 1). According to Theorem 1.1.9, it is sufficient to demonstrate convergence of the corresponding characteristic function (Pm@ to the characteristic function e~l I2 of the standard normal distribution.
34 The generalized scheme of allocation and the components of random graphs The characteristic function of ?| equals 1 °° kk~2xkeitk B(xelt) k\ B(x) By virtue of A.4.7), A.4.8), and A.4.9), B(x) = ^A -A - xB'(x) = 0(x), x2B"(x) = 02(x)(\-0(x))-1, x3B'"(x) = 83(x)(l- Therefore , _ W(xeu) V ~ B(x) 0(xelt) if) = r± /—TV- A.4.10) B(x)(\-0(xe«)) iO(xelt in,.-. iu\A.e <P @ = 7 ( -e(xeit)) Forx = 0e~e, 0B-0) 0(x)=0, B(x)= 2 \ Denote by ijs(t) the characteristic function of the centered random variable ?i - O(x)/B(x). Then IB f'@) = 0, V"@) = -ex2 = =-. A.4.11) Y (l-6)B-6J V Let It is not difficult to check that = 2i6{xeit)B32(xeit)-e(xeit)-2) (l-e(xeit)KB-e(xeit)J Therefore, if x = #e~^, then there exists a constant c such that
1.4 Forests of nonrooted trees 35 and = exp {- + O e\t\- A.4.12) The characteristic function <p#@ of the random variable (?# ~ m N) / ia wN) satisfies the equality (piq(t) = \j/N' (t/'(aV~N)); hence, for any fixed t, as N -> oo, ,2 = exp ¦ tA 1 A.4.13) The conditions of the theorem specify that NO -> oo, iV(l — 6K -> oo; hence, for any fixed /, as N —> oo, and the distribution of (?#¦ — mN)/(py/~N) converges weakly to the standard normal law. To prove the local convergence of these distributions, we need additional esti- estimates of the characteristic function (pit). It is reasonable to assume that the local theorem is valid in the same regions as the integral theorem proved above, but the necessary estimates are complicated to find, and therefore we restrict ourselves to a proof of the local theorem only in the case where 6 < 9q < 1 and 9N —> oo. From A.4.12), it follows that there exists s > 0 such that for \t\ < s and 9 < 6q < 1, \f(t)\<e -cah2 A.4.14) We now show that for any e, 0 < s < n, there exists a positive constant c such that for s < \t\ < tc and 0 < 0 < 1, If 9 ->• 0, then X — (Pit) = Now as 6 -> 0; therefore -cd 0e-° =e-82 + O(83), B(xeu) xeil + x2e2it/2 + O(83) Bix) = 6A-6/2) eu + (e2it -eitN/2+O(82). =1-26 sin2it/2) + OF2), A.4.15)
36 The generalized scheme of allocation and the components of random graphs uniformly in t, and for s < \t\ < n there exists 8 > 0 and c\ > 0 such that \(p@\ < e~C]0 A.4.16) for 6 < S. For any 6, 0 < 6 < 1, the distribution of ?| has maximal span 1 and q>{t) is continuous in t and 9 in the region b = {{t,ey.s < \t\ <tc, o< s <e < ij. Therefore q = sup |p(OI < 1, B and there exists C2 > 0 such that \<P(O\ < e-<*0 A.4.17) for (t, 6) e B. This estimate and A.4.16) imply A.4.15). Proving the local theorem, we follow the proof of Theorem 1.1.11 as a model for similar proofs. We set k-mN u = — and represent the difference RN = 2tc (aVNPN(k) ^ V V2 as a sum of the following four integrals: h = fA J— h = - f h = f e'itu(ir(t/(aVN)))Ndt, JA<\t\<sa-jN Jea-jN< f A<\t\ f A<\t\<sa-jN ea-jN<\t\<jio-jN where the constants A and s will be chosen later. To see that R^ -> 0 as N -> oo, we show that Rm can be made arbitrarily small by choosing of s, A, and N. It is clear that \h\< f e~t2'2dt, JA<\t\
1.4 Forests of nonrootcd trees 37 and | /2I can be made arbitrarily small by choosing a sufficiently large A. Choose e > 0 such that estimate A.4.14) is fulfilled. Then, for 0 < 90 < 1, \t\<e, \f(t/(aVN))\<e-ct\ so that [ f t2 |/3l< \i,(t/(ojN))\ndt<\ e'^dx, JA<\t\<saVN JA<\t\ and I/31 can be made arbitrarily small by the choice of sufficiently large A. For fixed A, the integral I\ tends to zero because cp(t) -> e~l ^ uniformly with respect to t in any finite interval. Finally, with the help of estimate A.4.17), we obtain that as N —> 00, TV dt hojN<\t\<no*jN f f \<p(t)\"dt J s<\t\<n < oVNe-c6N -> 0. ¦ Denote by p(u; a, fi) the density of the stable law with parameters a and fi in Zolotarev's parameterization (see [60]). If a ^ 1, the characteristic function f(t) of this distribution can be represented in the form f{t) = exp \-\t\" exp \-l-^K(a)/3-^ j j , where K(a) = 1 — |1 — a\. By the inversion formula, p(u;a,P) = ±- f e-'7Mexpf-|/rexpf-^(a)/^]]^. A.4.18) If N -+ 00 and 9 = 1, then the distribution of (?N - 2N)/(bN3/2), where b = 2B/3J/3, is approximated by the stable distribution with parameters Theorem 1.4.2. IfN ->> 00, G = 1, b = 2B/3J/3, = n) = /?(«; 3/2, - uniformly in the integers n such that u = (n — 2N)/(bN2^) lies in any fixed finite interval.
38 The generalized scheme of allocation and the components of random graphs Proof. The terms of the sum ?# = ?i + • • • + ?/v are independent identically distributed random variables, and for 0=1, * , ?=1,2,..., A.4.19) and E?i = 2, since 0(e~') = 1 and B(e~x) = 1/2. The maximal span of the dis- distribution is 1; therefore, by Theorem 1.1.10, it suffices to prove that the distribution of (On — 2N)/(bN2/3) converges weakly to the stable law given in the theorem. In addition to 6(x), a(x), and B(x) defined above, we consider the function OO , k k=\ This can be expressed in terms of 9 (z): Let g(z) = (l- By using the equalities and we easily obtain zg'(z) = -36(z) + 362(z) = 36(z) - 6B(z). Integration then gives r g'iu)du= Jo JO u JO u Expressing B(z) in terms of 9{z) demonstrates that, for \z\ < 1, C(z) = A - 1A - 6{z)J - \{\ - 0(z)K. Since 6{e~l) = 1, we find that C(e~l) = 5/12. Set u(z) = 1 - 6{z), v(z) = C(z) - C(e~ We have shown that v{z) = -\u2{z)-\u3{z). If we invert this expression, we obtain two formal solutions _ 6C(Z). u(z) = ±2iVv(i) + fv(z) + O(|u(z)|3/2);
1,4 Forests of nonrooted trees 39 since u(x) > 0 and v(x) < 0 for 0 < x < e~l, we choose the solution u(z) = -2/V^) + \v(z) + O(\v(z)\V2). A.4.20) Hence, A - 6{z)J = u2(z) = -4v(z) - ^-(v(z)f/2 + O(\v(z)\2). A.4.21) The first two derivatives of C{z) are ¦t-^ kk lzk l B(z) c (z) = E ^^ = -r- /t=i OO Therefore, for real t, C(e~l+it) - C(e~x) = it/2 + O(t2). A.4.22) Now we find an expression of the characteristic function cp(t) of the random vari- variable ?i with distribution A.4.19). It is clear that From A.4.20), A.4.21), and A.4.22), we find that for z = e'7, <p(f) = l-(l-0(z)J = l+4v(z) + l-^-{v{z)?'2 + O(\v(z)\2) /73/2 /^J +O(t2), where b = 2B/3J/3. By virtue of the equality .(it\V1 \int we can rewrite the last relation as Since e~2it = \-
40 The generalized scheme of allocation and the components of random graphs as t -> 0, we find that fit) = e~2it<pit) = 1 - \bt\3/2e\p — [ + O(t2). The characteristic function of the random variable (?#¦ - 2N)/ibN2^3) is V N and converges to at any fixed t. The function fit) is the characteristic function of the stable law p(u; a, ft) with parameters a = 3/2, ft = — 1. Therefore, according to Theo- Theorem 1.1.10, as N -> oo, /3 = az} - p(M; 3/2, -1) -+ 0 uniformly in A:, where u = (k - 2N)/ibN2^). The function /?(*; 3/2, -1) is positive for any x; hence, = k} = piu; 3/2, - uniformly in k such that u = (k — 2N)/ibN2/3) lies in any fixed finite interval. We now turn to the estimate of the number of forests Fn^ with n vertices, N trees, and T = n — N edges. Theorems 1.4.1 and 1.4.2 allow us to estimate the number of forests. Theorem 1.4.3. If n -> oo and 6 = 2T/n varies such that ON -> oo and Nil - 0K -+ co, then 2rr, A.4.23) Proof. Put 0=2T/n, x = 6e'e. A.4.24) By virtue of A.4.5), F^N=n'{BN{^Np{^N = n}, A.4.25) where the parameters are chosen so that ) ^ ?™ {1A26)
1.4 Forests of nonrooted trees 41 Since m = E?i =2/B-0) = n/N, by Theorem 1.4.1, P[^N = n} = — A +"A)), A.4.27) where (l-0)B-0J (l-0)iV2' If we substitute A.4.24), A.4.25), and A.4.27) into A.4.25), we can conclude that under the conditions of the theorem, Theorem 1.4.4. Ifn —> oo and IT'/'n —> 1 so that A - 2T/n)Nl/3 -> Z?2/3u/2, -oo < u < oo, AA28) Proof. Under the conditions of the theorem, n-2N u = thus, by Theorem 1.4.2, continuity, and positivity of the density p(u; 3/2, —1), = n} = p(-v; 3/2, -1)A + o(l)). A.4.29) We chose 0 = 1 in Theorem 1.4.2; hence, x = e~l and B(x) = B(e~l) = 1/2. Having substituted these values and A.4.29) into A.4.25), we conclude that, under the conditions of Theorem 1.4.4, n\ \2NN1/6 B/3J/3 p(-v,3/2, - Although the density p(x; 3/2, -1) cannot be represented in terms of simple functions, we can use the relation p(x; a, fi) = p(—x; a, —fi) and the following series expansion for x > 0 and 1 < a < 2 for our calculations: 1 v^ „ r((« + l)/a) „ Tin ( ( l\2-a p(x; a,P)=- T(-\)n — ^x" cos — 1 + 1 + - ) 7T^Q an\ 2 \ \ n) a
42 The generalized scheme of allocation and the components of random graphs 1.5. Trees of given sizes in a random forest Let [ir — At/-(w' N) be the number of trees with r vertices in a random forest with n labeled vertices and N nonrooted trees, r = 1,2 Recall that such a forest has T = n — N edges. In this section, we consider the asymptotic behav- behavior of the random variables /xr(«, N). Following the approach established in the previous section, we use the generalized scheme of allocation of n particles to N cells determined by identically distributed random variables ?i, ...,?#¦ sucn that 2kk-2Ok~le-k0 ?=1,2,..., o < e < 2. As we have calculated, 2-6' and for 0 < 6 < 1, We will also use the notation s2r = = pr(l-Pr- = 1, 2.... The random variables /xr behave much like the corresponding variables for a random forest of rooted trees. We highlight some of these results; see [30] for a complete description. As before, let 6 = 2T/n. Again the value 6 = 1 is of particular interest, so we introduce the following notations: For r = 1,2,..., TCr = TCrF) = = <x}r@) = Pr(9), Q<0<\, PriX), 1 < 6 < 2, srF), 0<6<\, The truncated values nrF) and a}rF) allow us to summarize the rather compli- complicated behavior of /xr, r > 3, in the following two theorems. Nnr{9) ->• oo, Theorem 1.5.1. Ifn, N ->• oo an^/ r = r(«) > 3
1.5 Trees of given sizes in a random forest 43 uniformly in the integers k such that _k- NnrF) lies in any fixed finite interval. Theorem 1.5.2. Ifn, N —> oo and r = r(n) > 3 varies such that Nnr{0) —> X for some X, 0 < X < oo, then for any fixed k = 0, 1, ..., Xke~k k} (l k} (l+ The random variables ix\ and /X2, like their analogs for forests of rooted trees, have some special properties, but we will not discuss them. When edges are added sequentially to a forest, then by Theorems 1.5.1 and 1.5.2, the asymptotic behavior of ixr does not depend on 9 if 6 > 1. If Npr{\) —> oo, then the limit distribution of /xr, with similar centering and normalizing, is the standard normal distribution for all 6, 1 < 9 < 2. There are similar results for the case 6 > 1 and Npr{\) —> X for some X, 0 < X < oo, with the limit distribution of the [ir for all 6, 1 < 6 < 2, being the Poisson distribution with parameter X. Thus the point 6 = 1 can be interpreted as a critical point in the evolution of a random forest. We now prove Theorems 1.5.1 and 1.5.2. Proof of Theorems 1.5.1 and 1.5.2. According to Example 1.3.4 and Lemma 1.2.1, r=k]= where f# = ?i + • • • + $n, ^r) = ^ + • • • + ^}, the random variables ?i,..., %n', %\ , • •¦, $ are independent and identically distributed, Pr = k\B(x) Jt=l A.5.2) and the parameter x of the distribution of ?i ,...,?# may be taken arbitrarily from the domain of convergence of the series B(x).
44 The generalized scheme of allocation and the components of random graphs We set 9 = 2T/n. It is convenient to choose jc = 0e~° for 0 < 0 < 1 and x = e~x for 1 < 0 < 2. With these choices, A.5.1) gives ^? ^ A-5.3) where P{?l = *} = 7r*@)> A:= 1,2 , and the distribution of ?j is defined by A.5.2). Reasoning by contradiction, we see that it is sufficient to prove Theorems 1.5.1 and 1.5.2 under the assumption that 6 lies in any of the following three domains: first, where N9 -> oo and A — 9KN^- oo; second, where A — 9KN is bounded by an arbitrary constant; and, third, where A - 9KN -> -oo. Negating either theorem implies the existence of a subsequence of the parameters n, N such that 9 lies in one of these three domains for which the other conditions are satisfied but for which the conclusion is false. Therefore we assume that n, N -> oo in such a way that 0 lies in one of the domains and prove the assertions of Theorems 1.5.1 and 1.5.2 in the corresponding three cases. Consider first Theorem 1.5.1 in the first domain of 6. By the de Moivre-Laplace theorem, the binomial distribution is approximated by a normal or Poisson distri- distribution. More precisely, if Nnr(9) -> oo, then p~z /z A.5/ uniformly in k such that (k-Nnr{9)J 2NTcr(9){\ - TcrF)) lies in any fixed finite interval. The probability P{t;N =n} from the denominator of A.5.3) has been estimated in the previous section. Applying Theorem 1.4.1, we have for# in the first domain, = n} = —7==(l+o(l)), A.5.5) where A-0X2-6) To find the asymptotics of the numerator of A.5.3), we begin by calculating the
1.5 Trees of given sizes in a random forest 45 first and second moments: 1 1 a} = a}{9) = where \i = E?, =2/B-9). A proof similar to that of Theorem 1.4.1 shows that a normal approximation is valid for the sum ffi = ^(r) -\ h ?^. More precisely, if n, N -^ oo such that >• oo and A - 0KN ->• oo, then ar v 27r A1" uniformly in r > 3 and s such that (s — Nmr)/(a^/N) lies in any fixed finite interval. We now use A.5.6) with s = n—kr and N—k summands to obtain an asymptotic expression for P{$N_k = n — kr}. Since k = where we have N - k = N(l - pr) - urarrVN = N(l - pr) A Ur°rr ) . A.5.7) It is easy to see that arr/{\ — pr) is bounded, and for ur lying in any finite interval, N-k=N(l - pr)(l + O(N~l/2)). A.5.8) The exponent in A.5.6) may now be written as 2 (n — kr — NmrJ ~ 2a}{N - k) ' Taking into account A.5.7), A.5.8), and the equalities Priix — r) a — r mr — \x = , mr — r = 1 - Pr 1 - Pr which hold for 9 in the first domain, we obtain k(mr — r) — N{mr — fx) pr' (fx — r)(k — Npr) u = ar(N-kI/2 ~ aarr(N-kI/2 p\t2{n - r) o(\ ~ Pr)l/2 ur{\ + o
46 The generalized scheme of allocation and the components of random graphs Applying A.5.7) gives 1 2 2 2 P|^m_jc = n — kf\ = :=? ' (I t <yr^/2TcN(\ — Pr) A.5.9) When we substitute A.5.4), A.5.5), and A.5.9) into A.5.3), we see that under the conditions of Theorem 1.5.1 with 6 in the first domain, this expression transforms into the product of an exponent and a coefficient. The coefficient of the expo- exponent is -Pr) 1 1 r(\ - Pr){\ - Pr{n - r Combining the exponents from A.5.4) and A.5.9) yields the resulting exponent {k-NprJ pr{n - VJ(k - NPrJ (k - NPrJ + o(l) = 1- o(l). 2Npr{\ - pr) 2a2(l - Pr)a}rN 2arrN Thus Theorem 1.5.1 is proved for 6 varying in the first domain. Under the conditions of Theorem 1.5.2, k is fixed, and when we apply A.5.5) and A.5.6) with the corresponding parameters, we obtain the ratio Therefore the assertion of Theorem 1.5.2 follows from the Poisson approximation of the first factor in A.5.3). In the second domain, we choose the parameter ofthe distribution of ?i, ...,?# to be 1. If Npr{\) -> oo, then kj,r,^,,^ „„, ^ j2nNpr(l)(l-pr(l))e ' ^ + ^^ A.5.10) uniformly in k such that k-Npr{\) lies in any fixed finite interval. Applying Theorem 1.4.2 gives = n} = p(u; 3/2, -1)A + o(l)) A.5.11) uniformly in n such that u = (n - 2N)/(bN2/3) lies in any fixed finite interval.
1.5 Trees of given sizes in a random forest 47 Restricting the random variables ^r\ ..., ^ does not affect their maximum span and convergence to the stable law with density p{u\ 3/2, —1). The only difference is that now the mean of a summand is E?j = mr(\) = 2/A - />,-(!)). Therefore, as j —> oo, (l-/v(DJ/3 [ (CJ - ymr(D)(l - P.(l)J/3 / - ymr(l))(l - r(l)J/3 { uniformly in / such that - jmr{\)){\ - pr(l)J/3 v = bj2'3 lies in any fixed finite interval. By substituting N — k for j and n — kr for / and recalling that k = Npr(l) + Zy/Npr(\)(\-pr(\)), where z is bounded, we have bN2/3P{^_k = n-kr}= p(u; 3/2, -1)A + o(l)) A.5.12) uniformly in r > 3, where, as in A.5.11), u = (n — 2N)/{bN2^), since n-2N v = 6/73 = Thus the asymptotics of P{^n — n] and P{Q_k = n — &r} is the same and their ratio in A.5.3) tends to 1. Therefore the asymptotics of P{/xr = k) is determined by the first factor and coincides with the asymptotics of the corresponding binomial probability. Theorems 1.5.1 and 1.5.2 have now been proved in the second domain. It remains to prove the theorems for the third domain, where A —IT/n^N —> —oo. We choose 9 = 1 in the distribution of the random variables ?i,..., ?# and prove that in A.5.3) the ratio -k = n ~ kr)/P{i;N = n}^\ A.5.13) uniformly in r, and k = Npr{\) + z^Npr{\){\ — pr{\)), where z lies in any fixed finite interval. In this case, A — 2T/nKN -> —oo, so the values n for the sum ?#¦ and the values n — kr for the sum $N_k lie m what is called the region of large deviations. Therefore we need to apply the theorem on large deviations. We will not give the
48 The generalized scheme of allocation and the components of random graphs proof, but the main idea is simple: If the distribution of a sum of independent identically distributed integer-valued random variables with zero mean converges to a stable law with parameter a, 1 < a < 2, then the major contribution to a large deviation of the sum is made by only one of the summands (see [137]). Applying this theorem to the sum ?#¦ gives the following result for 9 in the third domain. If n, N -> oo such that N(\ - 2T/nK -> -oo, then P{$N = n) = P{t;N - 2N = n - IN) -2 = n-2N}(\+o(\)) N A (n — (r) The theorem given in [137] cannot be applied to the sum t;^ , since its sum- summands become noninteger after centering by the expectation mr. Britikov, using the method given in [137], along with ideas from [58] and [113], proved in [30] that the probability P{i;^_k = n — kr} has the same asymptotics as P{$n = n}. More precisely, if n, N -> oo such that N(l — 2T/nK -> —oo, then N-k = n-Jcr} = P{^ -(N- k)mr = n - kr - (N - k)mr] = (N - k)P{^r) - mr = n - kr - (N - k)mr] = (N - k)P{^r) =n-2N+ 1/2 N uniformly in r > 1 and k such that lies in any fixed finite interval. Thus the ratio in A.5.3) tends to 1, and the asymptotics of P{/xr = k) is deter- determined by the first factor and coincides with the asymptotics of the corresponding binomial probability. This proves Theorems 1.5.1 and 1.5.2 in the third domain. The proof of Theorems 1.5.1 and 1.5.2 is now complete. ¦ 1.6. Maximum size of trees in a random forest The results of the previous section give some information on the behavior of the maximum size r}(N) of trees in a random forest from Tn,N with T = n-N edges. Indeed, if 9 = 2T/n -> 0 and there exists r = r(n, N) such that Npr(9) -> oo and Npr+\{9) —> X, 0 < X < oo, then the distribution of the number /xr of trees of size r approaches a normal distribution, and the distribution of/xr+! approaches
1.6 Maximum size of trees in a random forest 49 the Poisson distribution with parameter X. This implies that the limit distribution of the random variable r](M) is concentrated on the points r and r + 1. If 6 = 2T/n -> y, y > 0, then there are infinitely many r — r(n, N) such that the distribution of fxr approaches a Poisson distribution; hence, the distribution of r)(N) is scattered more and more as y increases. If 0 < y < 1, then the limit distribution is concentrated on a countable set of integers, whereas if y > 1, then r)(N) must be normalized to have a limit distribution, and the normalizing values tend to infinity at different rates, depending on the region of 6. Thus, it should be possible to prove the limit theorems for rj(N) when T/n -> 0 by using results on \xr from the previous section. But if2T/n —>• y for y > 0, this approach may not work, and even if it did, the proofs would not be simple. Therefore we choose instead to use the approach based on the generalized scheme of allocation. Let ?i ,...,?# be random variables with distribution 2rr-26r-le-re =k}= lBe) • 0<9<2, A.6.1) where k = 1, 2, .... We choose 0 = 2T/n. Then, according to Lemma 1.2.2, where h (n h (n bemg independent identically distributed random variables such ,..., that Pff^ = A:} = P{^i = k | ?i < r}, k=l,...,r, A.6.3) and Pr = Pr{6) = P{^i < r} = ^^E). A.6.4) /t=l We now state the theorems that completely describe the behavior of rj(N), deferring their proofs. Our procedure follows Britikov [28]. Theorem 1.6.1. Ifn, N -> oo, 6 = 2T/n -> 0, and the integers r = r(n, N) > 1 vary such that Npr{6) -> oo and Npr+\F) -> XforO < X < oo,
50 The generalized scheme of allocation and the components of random graphs Note that if A. 7^ 0 in the conditions of the theorem, then Npr{9) —> 00 without any additional requirements. In particular, the conditions of the theorem are fulfilled if T/n{r-X)/r -+ p, 0 < p < 00. Under this condition, Theorem 1.6.1 was proved by Erdos and Renyi [37], whose well-known paper provided the only results on the behavior of rj(N) until Britikov's work seventeen years later [28]. Theorem 1.6.2. Ifn, N -> 00, 6 = 2T/n -> y, 0 < y < 1, then for any fixed integer k, (y - 1 -logyM/2 P{r](N) ~ [a] <k}= exp -^ \^=^ >y~ ~ x)V2tt log n - § log log n e - 1 - log e ' ant/ [a] and {a} denote, respectively, the integer and fractional parts of a. Theorem 1.6.3. Ifn, N ^ 00, 0 = 2T/n ->• 1, and N(l - Of ->• 00, r/zen/or any fixed z, where ivgn - a = 1"^ where P = — log^e1"^) and u is the root of the equation Theorem 1.6.4. Ifn,N^- 00 such that Nl/3(l - 2T/n) -> v, -00 < v < 00, then for any fixed positive z, 00 J ( \ f P{y Xl Xs> 3/2' 1} A A Is{w,y)= / — dx{---dxs, A = {(*!, xs): Xj >w, j = 1, ...,s}, and p(y; 3/2, — 1) is the density of the stable law with parameters a = 3/2,
1.6 Maximum size of trees in a random forest 51 Theorem 1.6.5. ffn, N —> oo, N(\ - 2T/nK -> -oo, then for any fixed z, -oo We will prove Theorems 1.6.1-1.6.5 with the help of relation A.6.2). Under the conditions of Theorems 1.6.1-1.6.3, and the limit distribution of ^(at) is the same as the limit distribution of the maxi- maximum of the random variables ?i ?^. Therefore we first obtain some auxiliary results on the asymptotic behavior of oo Pr = Lemma 1.6.1. Ifn, N —> oo, 0 = 2T/n —> 0, and the integers r = r(n, N) > 1 vary such that Npr{6) -> oo, Npr+\{6) -> X, 0 < X < oo, then NPr-i ->• co, NPr ->• X, NPr+i ->• 0. Proof. Under the conditions of the lemma, x = 0e~e -> 0. It follows from A.6.3) that Pr = JTPr+s(e) = Pr+l(f» (l + ? ^l) , A.6.6) Taking into account the bounds for factorials we find from A.6.1) that where ci is a constant. Hence, Q) K cixe pr+lF) - I - xe =
52 The generalized scheme of allocation and the components of random graphs as q _> o. Now by virtue of A.6.6) and A.6.7), OO, NPr+l ~+ 0. Note that if X / 0, then Npr@) —> oo without any additional conditions, so this requirement may be excluded from the conditions of the lemma if A, / 0. Indeed, Npr@) = Npr+l(9) Pr+l(O) Since x —> 0 and pr+l{9) \r+\J x there exists a constant ci such that Npr{9) > C2Npr+\(9)/x and Npr{9) ->¦ oo. Lemma 1.6.2. Ifn, N -> oo, 9 = 2T/n -+ y,0 < y < 1, and r = r(n, N) -+ oo, then NPr = Npr(9)c(l-c) where c = ye^~y. Proof. It is clear that oo Pr(9) and pr(9) \r + sj Moreover, there exist constants c3 > 0 and q < 1 such that Pr+s@)lpr@) < c3(xe)s < c3qs. Therefore the series X^i pr+s(9)/pr(9) converges uniformly and we can pass to the limit under the sum so that OO OO E . , 1 - c
1.6 Maximum size of trees in a random forest 53 Lemma 1.6.3. Ifn, N -> oo, 9 = 2T/n -> 1, and N{\ - 0K -* oo, then for any fixed z, NPr -> e~\ where r is an integer such that fir = u + z + o(l), fi = — log^e1^), and it is the root of the equation /o\l/2 Proof. It is clear under the conditions of the lemma that fi = —\ogFex~e) -> 0 and u -^ oo, since Nfi3!2 ^ oo by virtue of the condition N{\ — 6K -^ oo. We apply Stirling's formula and obtain oo , t_? t / « \ 1/2 A:=r+1 The sum k>r is an integral sum of the function/(jy) = y 5^2e y with step f3 and is approximated by the corresponding integral: OO y-5/2e~y dy(l Therefore j / 9 \ NPr = ( - j A.6.8) By definition, rfi = u + z + o(l) and /2\l/2 Substituting these expressions into A.6.8) yields Now we are ready to prove the theorems of this section.
54 The generalized scheme of allocation and the components of random graphs Proof of Theorems 1.6.1-1.6.3. By applying Lemma 1.6.1, we find that under the conditions of Theorem 1.6.1, A - Pr_i) N 0, A - Pr) N ~\ A - Pr+l) N 1 as N ->¦ oo. These relations, together with A.6.5), whose proof is pending, imply the assertion of Theorem 1.6.1. Let _ \ogn - | log log n 0-l-log0 ' and choose r = [a] + k, where k is a fixed integer. Under the conditions of Theorem 1.6.2, r = [a] + k ->¦ oo and according to Lemma 1.6.2, NPr = Npr@)c{\ - c)-\\ + where c = ye1"^. It is easy to see that ... r!B-0) Thus (/ 1 +0A)), and consequently, Under the conditions of Theorem 1.6.3, Lemma 1.6.3 shows that NPr and - Pr) N Thus, to complete the proof of Theorems 1.6.1-1.6.3, it remains to verify A.6.5) under each set of conditions. Since ON -+ oo and N{\ - OK -> oo, by Theorem 1.4.1 the random sum ^ is asymptotically normal, and
1.6 Maximum size of trees in a random forest 55 where A-0X2-0) While estimating the asymptotic behavior of A — Pr)N in Lemmas 1.6.1 -1.6.3, we determined the choice of r. We now prove the central limit theorem for the sum lx for these choices of r. Set Bm = a(9)Nx/2. The characteristic function of the random variable f [ — m{9), where m{9) = E?i, is ,-itm@) J_ . e-itm(9) r k=l l~Fr \ k>r where (pit) is the characteristic function of the random variable ?i. Hence, the characte written characteristic function <pr(t, 6) of the random variable (^ ~~ Nm (O))/Bn can be 7-itNm{6)/BN ¦ ' ' N According to Theorem 1.4.1, the distribution of (?# — Nm(9))/Bn converges to the standard normal law, and consequently, It is clear that J2 Pr@)eitk/BN = Pr + J2 Pr@)(eitk/BN - l) = Pr + O k>r for and it is not difficult to prove in each of the three cases that A.6.10) k>r Estimates A.6.9) and A.6.10) imply that for any fixed /, and the distribution of (?Jy — Nm@))/BN converges to the standard normal distribution. The local convergence needed for the proof of A.6.5) can be proved in the standard way. Thus the ratio in A.6.5) tends to 1, and this, together with the estimates of A - Pr)N, completes the proof of Theorems 1.6.1-1.6.3. ¦
56 The generalized scheme of allocation and the components of random graphs To prove Theorem 1.6.4, the following lemma is needed. Lemma 1.6.4. If N —>¦ oo, the parameter 9 in the distribution A.6.1) equals Arl/3A — IT In) ->¦ v, and r = zN2/3, where z is a positive constant, then where C ?-3/21 / °° 1 / 3 \5 Is(z,y)\, and Is(z, y) is defined in Theorem 1.6.4. Proof. As yV ->• oo, r~2e~k /2\1/2 l 2kr~2e~k /2\1/2 Pk = Pkd) = = l-J k~5/2(l + o(l» A.6.11) uniformly in k > r. It is clear that itk i ! ^rtk y5'2 The last sum is an integral sum of the function y~5/2eity with step l/(bN2/3); hence, Set Then l#(*,z)l < H(O,z) = -^— f y-5l2dy= Z-—=. A.6.13) Taking into account b = 2B/3J/3, we obtain from A.6.12) and A.6.13) that ^ f itk 1 / 2 \1/2 ^ 1 [ itk \ / 1 \ A:>r t^. A.6.14) In particular, A.6.15)
1.6 Maximum size of trees in a random forest 57 The characteristic function (prit, 1) of the random variable (?^ — 2N)/ibN ' ) can be written N where (pit, 1) is the characteristic function of ?i — E?i. Note that E?i = 2 in this case. It follows from A.6.13), A.6.14), and Theorem 1.4.2 that k>r where fit) is the characteristic function of the stable distribution with density piy; 3/2, —1). Thus, for any fixed t, as N —> oo, (prit, 1) -> git, z) = fit) exp{-Hit, z) + H@, z)}. The function git, z) is continuous; therefore, by Theorem 1.1.9, it is a characteristic function. Since \g(t, z)\ is integrable, it corresponds to the density The span of the distribution of fj is 1; therefore, by Theorem 1.1.10, the local convergence is valid. Thus it remains to show that /(z, y) has the form given in Theorem 1.6.4. Representing e~H^^ by its Taylor series gives Yj^y), A.6.16) where 1 C°° fsiz, y) = ~ e-ltyfit)Hsit, z) dt. 2n y_oo It is easy to see that the function IJnz^l^Hit, z) is the characteristic function of the distribution with density Pziy) = |z3/V5/2, y>z. A.6.17)
58 The generalized scheme of allocation and the components of random graphs Therefore the function is the characteristic function of the sum ft + ft\ H h As of independent random variables, where ft has the stable law with density p{y; 3/2, — l)andjSi,..., fts are identically distributed with density p:{y). The density of the sum ft + ft\ -\ h fts is where Is(t, y) is defined in Theorem 1.6.4. Thus 1 C°° / 3 V / ef(t)H(t,y)dt (=) Is(t,y). When we substitute this expression into A.6.16), we find that A.6.18) Taking into account A.6.15), Theorem 1.4.2, and A.6.18), we see that Theo- Theorem 1.6.4 follows from A.6.2). To prove Theorem 1.6.5 with the help of A.6.2), we need to know the asymptotic behavior of large deviations of PRjy" = n}. We give that information without proof (see [28]). Lemma 1.6.5. Ifn, N —> oo, the parameter 9 in the distribution A.6.1) equals 1, N{\ — 271/nK —>¦ —oo, and r = n — 2N — bzN2^, where z is a constant, then -: 3/2, - A.6.19) The assertion of Theorem 1.6.5 follows from A.6.19), Theorem 1.4.2, and the fact that N Pr -^ 0 under the condition of Theorem 1.6.5. 1.7. Graphs with unicyclic components A graph is called unicyclic if it is connected and contains only one cycle. The number of edges of a unicyclic graph coincides with the number of its vertices. Let Un denote the set of all graphs with n vertices where every connected component is unicyclic. Any graph fromZ4 has n edges. In this section, we study the structure of a random graph from Un. We follow the general approach described in Section 1.2. As usual, denote by un the number of graphs in Un; we will study un as n ->¦ oo. Let bn be the number of unicyclic graphs with n vertices, and b^ the number of
1.7 Graphs with unicyclic components 59 unicyclic graphs with n vertices, where the cycle has size r. The cycle of a unicyclic graph is nondirected; in other aspects, a unicyclic graph is similar to the connected graph of a mapping of a finite set into itself. Let dn be the number of connected graphs of mappings of a set with n labeled vertices into itself, and dn the number of such graphs with the cycle of size r. It is easy to see that P?\ P?\ ^=d^/2, r>3. Introduce the generating functions _ ... n\ ^ n\ These functions can be represented in terms of the function 9(x) = OO „_] „ nn lxn n=\ n\ which is the root of the equation 9e 9 = x in the interval [0, 1]. This function was used in Section 1.4. Taking into account the notation introduced here and using the results of Section 1.4, we see that d(x) = -log(l - 0(jc)), c{x) = \(\ - A - 6(x)J). Since bn = b{nl) H h b{nn\ we have OO = -d(x)+6(x)--c(x) = — log(l - 9(x)) + 0(x) - -A - A - 0(x)J). A.7.1) In accordance with the general model of Section 1.4, let us introduce independent identically distributed random variables ?i, ...,?# for which A.7.2) The number of graphs in Un with N components can be represented in the form n! x—v bn, • • • bnM n . • A . "jy • J V . .A- «H h«JV=« A.7.3)
60 The generalized scheme of allocation and the components of random graphs In what follows, we choose Theorem 1.7.1. As n -+ oo, " 2'/4r(i/4)i where /»OO = / X' JO is the Euler gamma function. Before proving Theorem 1.7.1, we will prove some auxiliary results. Lemma 1.7.1. Forx = A - A - 9(xeit)f = i - lit + ei @ + s2(t, n), where S\{t)/t —>¦ 0 as t -^ 0 uniformly in n and \e{t, ri)\ < l\t\/*Jn. Proof. We found in Section 1.4 that u(w) = A - 9{w)J = 1 - 2 k=\ When we write u(w) as kK zur k\ 1 n \w\ < e ~x A.7.4) it is clear that 0(jc) = l-l/^/n and^Ce) = 1; therefore u(e~l)-u(x) = -l/n. With this equality and the observation that x <e~',we obtain the estimates \ei(t,h)\ = \u(xelt) - u - l/n = 2 oo E k=\ oo kk-2(e-k-xk)(eitk-l) ~k -xk)\t\ k=\ k\ = 2\t\(9(e-l)-9(x))=a\t\/Vn~. A.7.5)
1.7 Graphs with unicyclic components 61 The function u(e~l+") has the first derivative —2i at the point / = 0; thus, as /-> 0, = -2it+o(t). A.7.6) The assertion of the lemma follows from A.7.4), A.7.5), and A.7.6). ¦ Lemma 1.7.2. Ifn —>¦ oo, N = a log n +o(log n), where a is a positive constant, then ' 2°T(a) uniformly in k such that z = k/n lies in any fixed interval of the form 0 < zq < Z < Z\ < OO. Proof. The characteristic function of the sum (?1 +¦ • -+%N)/n is equal to <pN(t) = cpN(t/n), where cp(t) is the characteristic function of ?1. It is clear that <p(t) = B(xeit)/B{x). Lemma 1.7.1 and equation A.7.1) give 4B(xeit/n) = - log ~ + 3 + o(l). v n Therefore /t\ B(xelt/n) _ V\n) = B(x) \ogn and if N = a \ogn + o(l), then for any fixed /, = <pN(t/n)=(l-1^ logn J A-2/0"' and the distribution of (?i + • • • + %N)/n converges weakly to the distribution with density 2«r(a) that is, to the chi-square distribution with 2 degrees of freedom, which corresponds to the characteristic function A — 2it)~a. The local convergence can be proved in the usual way by using Lemmas 1.12.3— 1.12.7 from [78], ¦
62 The generalized scheme of allocation and the components of random graphs Let un,N be the number of graphs in Un with /V components and 1 N -k.. kn = - log/i, « = Lemma 1.7.3. Ifn -> oo, uniformly in N such that \u\ < (logn Proof. It is clear that Un,N = n\ ^-y bni ¦ ¦ -bnN ^^ n\\---nN\ By putting a = 1/4 in Lemma 1.7.2, we obtain uniformly in N when |«| < (log The assertion of the lemma follows from A.7.7) and A.7.8), since B{x) = \ \ xn = d.7.7) + • • • + Hn = n) = 2l/4f!A/4) e-1/2d + o(D) A-7.8) A.7.9) The assertion of Theorem 1.7.1 can be obtained by summing un^ over N. Lemma 1.7.3 estimates un,N for N close to kn. The following lemmas give esti- estimates ofun,N for the other values of N needed in the proof. Lemma 1.7.4. For any fixed <xq, a\, 0 < «o < ct\ < oo, there exists a constant c\ such that for «o log n < N < ct\ \ogn, i NO-Xn Proof. It follows from Lemma 1.7.2 that there exists a constant A such that nP{t-i + --' + SN = n}<A A.7.10)
1.7 Graphs with unicyclic components 63 n < /V < a\ \ogn. Indeed, if A.7.10) did not hold, then a sequenceof the parameters n -h> oo, N = a log n + o(log n) would exist for which the assertion of Lemma 1.7.2 would not be true. Lemma 1.7.4 then follows from A.7.7), A.7.9), and A.7.10). ¦ Lemma 1.7.5. If N < logn, then there exists a constant ci such that H \-%n =n} < c2\ogn. Proof. It is well known that m — 1 k=o Indeed, since the number of forests with n nonroot vertices and N rooted trees labeled 1,..., NisN(n+N)n~l, the number g^ of connected graphs of mappings of an m-set into itself with the cycle of size r can be represented as (m-r)\ Here (^) is the number of possible choices of r vertices that constitute the cycle; (r — 1)! is the number of cycles that can be constructed from r vertices; and rmm-r-\ js me number of forests with r cyclic vertices as the roots. Hence, As m —>¦ oo, dm = \(m-l)\em(l+o(l)), and there exists a constant c3 such that bm <dm <c3(m-l)\em. Moreover, B(x) = log/i(l + o(l))/4 and xm < e~m for all m > 0. Therefore there exists a constant C2 such that A.7.11) m logn It is clear that N U te =Mn
64 The generalized scheme of allocation and the components of random graphs Since P{?i = k) decreases as k increases, we have k>[n/N\ k>[n/N] = [n/N]}. The lemma now follows from A.7.11). Lemma 1.7.6. For N < \ogn, un,N <c4nn-l/4\ogn n where c\ is a constant. This lemma follows from A.7.7), A.7.9), and Lemma 1.7.5. Proof of Theorem 1.7.1. Roughly speaking, un^ = cX%e~x" /N\, where c does not depend on N, and to obtain un, we sum the Poisson probabilities whose sum is 1. To do this rigorously, we divide the sum oo N=l into four parts. Recall that u = (N — Xn)/VKJ- Let A\ A2 At, where Ax = {N: \u\<(\ogn)y/4}, A2 = {N: \u\ > (lognI/4, aologn < N < a\ logn}, A3 = {N: N < aologn}, A4 = {N:N > a\ \ogn). Asn -^ 00, ^7T (L7-12) A\ therefore it follows from Lemma 1.7.3 that
1.7 Graphs with unicyclic components 65 It remains to show that 52, 53, and 54 are o(n{~l/A). Lemma 1.7.4 implies that and it follows from A.7.12) that 52 = o(nn~l/4). To obtain an estimate for 53, we use the inequality E m m \<N<m which is true for m < X. Choose «o < 1/4 such that «o -aologao -«olog4 < 1/8. Then, for m = <xq log n, X^e~x" c5 m ;— - -ttr » where C5 is a constant. By using the estimate from Lemma 1.7.6, we find that 53 <C4C5nn-l^-^3logn. To obtain an estimate for 54, we use the inequality where c^ is a constant, which follows from A.7.7) if P{?i + ••• + ?# = n) is replaced by 1. For m > X, ^ XNe~x Xm N>m Choose a\ > 1/4 such that a 1 — a\ logai — a\ log 4 < —2. Then form = a\\ogn and Xn = (logn)/4, we have the estimate X%/m\ < n~2; thus A.7.13) implies that 54 < c6nn~5/4. The assertion of the theorem follows from the estimates obtained for S\, S2, S3, and 54. ¦ We denote the number of components in a random graph of Un by xn. The following theorem is a direct corollary of Lemma 1.7.3 and Theorem 1.7.1. Theorem 1.7.2. As n ->¦ 00, P{xn = N}= 2 2'2 n log n uniformly in N for which u = (N — \\ogn)/J ^\ogn lies in any fixed finite interval.
66 The generalized scheme of allocation and the components of random graphs Indeed, Lemma 1.7.3 and Theorem 1.7.1 imply that uniformly in \u\ < (lognI/4, where Xn = ^ \ogn. We now consider the maximum size fin of the components of a random graph from Un. Theorem 1.7.3. Ifn-^-oo, then for any fixed y, 0 < y < 1, 0<s<\/y where Wq(x, y) = 1, and for s = 1,2,..., ---dxs ,y)= J[xi>Y, i=l,...,s, 7377- i+-+xs<z] xl • " -XS{Z - X\ — ¦ ¦ ¦ — Xs) I Proof. To study f}n, we use the general approach of Section 1.2. Let rj\,..., be random variables with distribution P{^i = n\, ..., t)n = n^} = P{?i = ni,..., ?jv = njv | ?1 H 1- Hn = «}• A.7.14) It follows from A.7.7) that these variables can be interpreted as the sizes of the ordered components of a random graph from Un (see Section 1.2), in which xn is N. Therefore oo JV=1 where 0 < y < 1 and rj(^) = max\<i<M Vi- By Lemma 1.2.2, \ A.7.16) where f 1,..., %N are independent identically distributed random variables for which and the random variables ?1 ,...,?# have distribution A.7.2). We now estimate
1.7 Graphs with unicyclic components 67 for* = A - \/<s/n)e~[ + l/^. By A.7.1) for any fixed y, 0 < y < 1, asn -> oo, __ ., 1 v^ tlyn (t) = — / 2 t—1 Let us prove that where 4 It is easily seen that 1 ^2 m=0 uniformly in k > yn. Therefore, as n —>¦ oo, * it Hyn(t)= ? - 1 - _ exp This sum is an integral sum of the function u-le(l~2lt)u/2 wjtj1 step \jn Hence, I C°° Hyn(t) = - / u-l ? Jy In particular, we obtain the following estimate for the tail of the distribution A.7.2): = 4Hyn@)+o(l) = 4H(y,0)+o(l) ? log n log n as n —>¦ oo. We now find the limit distribution of the sum (f i H h |jv)/aj. The character- characteristic function of fi/n is = cp(t/n)-HYn(t)/B(x) l-Hyn@)/B(x) ' Using the estimates /i) = I -log(l -2iO/log«+o(l/log/i), 4B(x) = log/i+ 0A),
68 The generalized scheme of allocation and the components of random graphs from A.7.16) and A.7.17), as /i -> oo, yields / log(l -2it)-4H(y,t) + o(\)\ ( and for any fixed t and N = \ log n + o(log n), fN{t) - <py(t) = (l-2il/ When we expand e~H{y^ into its Taylor series, as we did in the proof of Lemma 1.6.4, we find that the characteristic function <pY @ corresponds to the density {_iy Q<s<\/y Thus, for any y,0 < y < 1, the distribution of (|h r-fjv)/« converges weakly to the distribution whose density is fy (z) as n —>¦ oo and TV = | log n + o(log n). We can show that local convergence of these distributions holds. If n ->¦ oo and TV = ^ logn + o(logn) and 0 < y < 1, then A.7.18) holds uniformly in A: for which z = k/n lies in any given interval of the form 0 < zq < z < z\ < oo. Using A.7.17), we find that for n ->¦ oo and TV = ^logn + o(logn), (Ptt. < m))» = (l - 4^:0) + OA)) = e-"<^ + 0(l). A.7.19) Substituting estimates A.7.19), A.7.18), and A.7.8) into A.7.16) gives P{ri(N)<yn}= J2 (-^Ws(hy) + o(l). A.7.20) To obtain the distribution of fin, we need to average the distribution of r]^) with respect to the distribution of xn. By Theorem 1.7.2, the number of components xn is asymptotically normal with parameters (\ log n, | log n), and for TV = |logn + o(logn), the probability P^n) < y«} is asymptotically constant; therefore the assertion of the theorem follows from A.7.15). ¦ Denote by Un^ and Un^ the sets of all graphs with n labeled vertices consisting of unicyclic components where each cycle has more than one or more than two vertices, respectively. It is not difficult to see that we can treat Unj, i = 2, 3, in the same way as Un (which, following the above notation, we have to denote by
1.7 Graphs with unicyclic components 69 UIU\). The role of B(x) forUni, i = 2, 3, is played by the generating functions 00 h .v» Yl I n\ , i=2,3, where bnj is the number of unicyclic graphs with n vertices and cycle lengths not less than i. It is clear that 00 OO 5 r=2 r=3 1 'B2(x) = -l-c(x) + d(x) = -1A - A - 9{x)J) - i log(l - and for x = A - 1/V« Similarly, = and for x = A — *2(*) = E2(x))^ = OO - OO r=3 r=3 -ilog(l-0(*)) 53(x) = -logn- ^/4( 1 4 1 4 1+O(D). 3 4 A.7.21) Therefore, if n -> oo, then for the numbers «„ of the graphs in Z^,/ and for the number u^N of such graphs with N components, we have uf = uniformly in the integers N such that \N — Xn\/*/K lies in any fixed finite interval,
70 The generalized scheme of allocation and the components of random graphs where ''/4r(i/4)' 2'/4r(i/4)' 3 2'/4r(i/4)" K } Theorems 1.7.2 and 1.7.3 are valid for the random variables xn and fin in Un,2 1.8. Graphs with components of two types The generalized scheme of allocation can be used in the investigations of random graphs with nonhomogeneous structure. Consider the set Anj of all graphs with n vertices and T edges where each connected component contains no more than one cycle. As usual, we assign equal probabilities to the elements of Anj and consider a random graph with values from Anj- Since any graph from the set Anj consists of trees and unicyclic components, we can use the results of the previous sections to study various characteristics of a random graph from Anj- Consider first the number of elements in Anj. As in the previous sections, we will denote by an the number of graphs under consideration with n vertices and by bn the numbers of connected graphs under consideration with n vertices. Instead of Anj, we will use, where necessary, the notation An \ if cycles of lengths 1 and 2 are allowed; A T if cycles of length 1 are forbidden; and An T if cycles of lengths 1 and 2 are forbidden. Denote the number of graphs in AnT by anl'T and preserve the notation anj if the specialization is not needed. In accordance with the previous sections, the number of forests with n vertices, T edges, and N = n — T trees is denoted by Fn,N- We use u^ to denote the number of graphs with n vertices and unicyclic components if they are included in AnT, i = 1,2,3, and preserve the notation un for the number of such graphs in Anj if the specialization is not important. It is clear that Fn-m,N- A.8.1) Theorem 1.8.1. Ifn, T -> oo such that T/n -> 0, then 2T an,T = Fn,Na + 0A)) = ^fyTj A + 0(D). Proof. It follows from Theorem 1.7.1 that there exists a constant c\ such that um < cxmm-xl\ A.8.2)
1.8 Graphs with components of two types 71 Theorem 1.4.3 shows that under the conditions of Theorem 1.8.1, A.8.3) The condition T/n -+ 0 implies that (T - m)/(n - m) -> 0 uniformly in m, 0 < m < T. Therefore, under the conditions of Theorem 1.8.1, there exists a constant ci such that c2(n - F < forallm,0 <m <T. We obtain from A.8.1), A.8.2), A.8.3), and A.8.4) that > m=\ a,,) This completes the proof because 2Tn/(n - TJ -+ 0. Let (onj be the number of vertices contained in the unicyclic components of the random graph in Anj. It is easily seen from Theorem 1.8.1 that if n, T —>¦ oo and T/n ^ 0, then and the limit distributions of the number of trees of fixed sizes in a random graph from Anj coincide with the corresponding limit distributions in a random for- forest and are described in Theorems 1.5.1 and 1.5.2; the limit distribution of the maximum size of trees in a random graph from Anj is given in Theorem 1.6.1. Now let n, T -^ oo such that 0 = 2T/n -+ X, 0 < X < 1. According to Theorem 1.4.3, under these conditions, A.8.6) If n, T -> oo, IT In -> X, 0 < X < 1, and m = o(n), then by Theorem 1.4.3, (ti - F Since 9 = 2T/n -> X, 0 < X < 1, implies 2B" - /n)/(/i -m) <6, there exists a constant c such that )!- (L8'8)
72 The generalized scheme of allocation and the components of random graphs In subsequent proofs, we will use a cumbersome technical estimate given in the following lemma. Lemma 1.8.1. Let n, T -> oo and let there be constants Xq and X\ such that 0 < Xo < 9 = IT In < X\ < 1. Then x (\--)...(l-^Zl)< i, A.8.9) where mo < m < T and mo is sufficiently large. Proof. Write the logarithm of cnj(m) as \ogcnJ(m) = oo . m-\ oo r.rp , 1 x-^ i- x-^ 2T /m\k k=\ i=\ k=2 oo n , oo ., m-1 Using we obtain the estimate 2m /m\^ ^-^ 2T () 2Z ) 2Ztr ( (m — X)K± -s-^ (m - l) k{k +l)Tk f^ k{k + \)nk k=\ ]g(^)g( n '—' V T I \ n (=1 m-\ , . s. m) TK \ m)
1.8 Graphs with components of two types 73 To prove the assertion of the lemma, we note that for sufficiently large m, 2* / 1 1 \Hk(l) , 1 ^ \ m) 0k \ m) for all k. Indeed, since 0 < Xq < a < X\ < 1, for sufficiently large m, and therefore 1 m 6k ck<2(k+l)-2k, which implies that ck < 0 for all k > 3 and sufficiently large m. In addition, 1 ~ V m) 0 V m) ~ 0 6m m' ( 1\3 4 / 1\3 4 4 4 C2 = 6-20-1 -T-l <5-20--. + -r- + -, V m) 62 \ m) ~ 62 62m m and ci < 0, C2 < 0 for sufficiently large m, since for 0 < Ao < 0 < Ai < 1, 3-0 <0, 5-20 r<0. 6 02 ¦ Let bnj be the number of connected unicyclic graphs with n vertices that belong to A^T, i = 1, 2, 3. If this specification is of no significance, we write bn for the number of connected unicyclic graphs. Let anj{k) be the number of graphs in Anj with exactly k cycles. It is clear that an,T(k) = -^[ )Fn-m,N ?J "^ ^ (L8-10) k\ *—' \m) L—t m\! • • ~~~ m=k mi~\ \-mk=m As in Section 1.7, let oo nl «i A.8.11) 00 A « and set x = 9e~
74 The generalized scheme of allocation and the components of random graphs For such x, according to A.7.1) and A.7.2), ifl + i B2(x) = -i = -\\og{\ - 0) - ± B3(x) = - = -\\og{\ - 6)-\0-\02. Theorem 1.8.2. Ifn,T -> oo such that 9 = 2T/n -> X, 0 < X < 1, then for any i = 1, 2, 3 and any faced k = 0, 1, ..., «..r<*> = 2rm! A+OA))' where an T (k) is the number of graphs in AnT with exactly k cycles, and _ 1 XX2 2 2 4' 1 X X2 ¦- og( - )- - + —, A A2 Proof. We partition the first sum of A.8.10) into two parts, Si and S2. We set M = T1/4 and include in Si the summands with m < M. For any x from the convergence domain of the series A.8.11), the estimate ' ' ' /w>M »»H \-mic=m m>M/k holds. As in Section 1.7, let dn be the number of connected graphs of single-valued mappings of a set with n elements into itself and let 00 j vm d(x) = ml m=l Since m-\ bm<dm=(m- 1)! J2 7T - (m ~ l)[ k=o
1.8 Graphs with components of two types 75 (see the proof of Lemma 1.7.5), the estimate " m>M/k m>M/k holds. Recall that we chose x = 0e~e. According to the hypothesis of the theorem, 6 = 2T/n -> a,0 < a < 1, and there exists # < 1 suchthatex = 0el~e < q < 1 beginning with some n. Therefore y b^l<_L.qM/k A8 ^—' m! 1 — q m>M/k ^ Taking into account estimates A.8.8), A.8.9), and Lemma 1.8.1, we find that n-m,N TT > > I m>M m\-\ \-mk=m x < - Y^ V n\{n-m)^T-m)hmx---bmk r\}--n ¦¦¦[}-— +mk=m \ / \ / / 1 \ / m — l\ bm, • • -bn x. \ n / Of ( c\n2T k\2TT\ m>M (B(x) ,2T m\+- /7 V I m>M/k m —0 m I ~ k\2TT\{\-q) where c\, c2 are some constants. Thus, under the conditions of the theorem, We now estimate the sum Si. According to A.8.8), + 2T-m{T_m)[ n x V i — a ?A uniformly in m < M =
76 The generalized scheme of allocation and the components of random graphs Therefore, for any fixed k = 1,2 ^r m<M m\-\ m! )Fn-m<N—- ,2T nZI s/\-X kl2TTl M E E m\ m\\ \ ¦ ¦ - m=k m\-i \-mic=m Taking into account the estimate of S2, we obtain n Si = k\2TT\ 00 E bn ¦¦ h xmk ^m=k ¦ +mi(=m m\\ ¦ ¦ ¦ 2TTlkl Combining the estimates of Si and S2 yields n' anJik) = mk under the hypothesis of the theorem. Since x = 0e~e -> Xe~k, we also have *Ai, B2(x)-+A2, B3(x)-+A3. Theorem 1.8.3. Ifn, T -> 00 such that 6 = 2T/n -> X, 0 < X < 1, then ?A/, i = l,2,3, (/) n an,T = —2TTl - where A,-, / = 1, 2, 3, as in Theorem 1.8.2. Proof. To obtain the asymptotics of an j, we have to estimate the sum 00 A.8.14) *=0 After normalization, we have ,2T \ -1 00 / ^27 \ -] n A.8.15) A:=0 where for any fixed k = 0, 1, ..., / M2r \ -1 it!
1.8 Graphs with components of two types 77 as n, T -» oo, 2T/n -> X, 0 < A < 1. We can pass to the limit under the sum in A.8.15) if the series converges uniformly with respect to the parameters n, T. To see this, it suffices to obtain an estimate {wf\) a"-T-Ak (L8I6) such that the series YltLo ^-k converges. Using A.8.8) and A.8.9) and reasoning as we did in the proof of the estimate of 62 give m = \ m\-\ \-mk=m x 7 rJlT °° Ymh cn ^—v ^—v x om] k\2TT\ ^ ^ mi\---mk\ m=k /»H \-mic=m _ cn2T(B(k))k ~ 2TT\k\ ' Thus we have an estimate of the form A.8.16) and can pass to the limit under the sum in A.8.15) to obtain ~2TT\ Depending on the set of graphs under consideration, replace B{x) with B\{x), ), or Bt,{x), and Theorem 1.8.3 is proved. ¦ a«,T = H OVT. "exp{B(Xe-x)}(l+o(l)). A random graph from Anj has exactly N = n — T trees and a random number of unicyclic components. We denote by x^T the number of unicyclic components in a random graph from A^T, i = 1, 2, 3. Theorem 1.8.4. Ifn, T -> oo such that 6 = 2T/n -> X, 0 < X < 1, then for any i — 1, 2, 3 and for any fixed k = 0, 1,..., where the A,- are as in Theorem 1.8.2. Proof. The assertions of the theorem follow from Theorems 1.8.2 and 1.8.3, since Now we consider the case 6 = 2T/n -> 1. Let con T be the number of vertices that lie in the unicyclic components of a random graph from An T, i = 1, 2, 3. It is clear that if we know the distribution of a characteristic of the random graph
78 The generalized scheme of allocation and the components of random graphs under the condition {cc^\ = /nK tne unconditional distribution can be obtained by averaging over the distribution of con T. Theorem 1.8.5. Ifn,T -+ oo such thate = 1 - TTIn -» Oand?3n -» oo, then for any i = 1,2,3, uniformly with respect to m such that y = ?2m/2 lies in any fixed interval of the form 0 < yo < y < y\ < oo, and there exists a constant A such that, for all m, Proof. We denote the number of graphs in AJ T by a^ T and the number of graphs for which a>n T = m by a?T m. Clearly, oo m=0 A.8.18) We decompose the sum in A.8.17) into two parts. Let 0 < yo < y\ < oo, y = ?2m/2, and oo \] m=0 By Theorem 1.7.1 and the equalities A.7.21), %>ml/4 A.8.19) uniformly in m in the region yo < y < y\, where A,, i = 1, 2, 3, are defined in A.7.22). There exists a constant c\ such that, for all m, "i? < c\mm-xl\ A.8.20) To estimate Fn-m^, it is convenient to use the intermediate formula A.4.25). From A.4.26) and the equality 6B-6) = 1-e2, we have (n-m)!(l -s2)N F
1.8 Graphs with components of two types 79 where, according to Theorem 1.4.1, = k) = uniformly in k such that u = (k — N(jl)/(cr «/N) lies in any finite interval, 2 n 2 2A-e) 2-6 N S(\+SJ' If e —> 0, e3n —> oo, and m^/s/n —> 0, then for k = n — m, u — (k - Nfi) m a a A+0A)). Consequently, P{SN=n-m} = It follows from A.8.21) and A.8.22) that (n - m)\ (\ - ?2)Nen{l~s There exists a constant c such that aV2jtNP{^N = k] <c; therefore Fn-m,N < _ wA_e A.8.22) A.8.23) A.8.24) for all m, 0 < m < T. We note that as s -> 0, (l-e)mesm =e~y(l + uniformly in m such that yo < y < yi, and for all m, A - ?re?m < e~y. Clearly, A.8.23) holds uniformly in m such that y = ?2m/2 lies in the interval [.yo* y\\- Therefore, if n -> oo, e = 1 — 2T/n -> 0, and e3n -> oo, then Fn-m,N = fn(n- m)\ e~m-y{\ + o(l)) A.8.25) uniformly in m such that yo < y < y\, where fn = 2NNl(\ -e)"
80 The generalized scheme of allocation and the components of random graphs and there exists a constant Aq such that for all m, Fn-m,N < AoMn-m)\e-m-y. A.8.26) Therefore, by A.8.18), A.8.19), and A.8.25), we have the equality m-\/4 -m-y <7> =n\Aifn- ¦ A+0@) A-8.27) m\ which holds uniformly in m such that yo < y < y\; and outside of this domain, by A.8.18), A.8.20), and A.8.26), we have ?2 -y—, A.8.28) where A is a constant. The sum 1 ?2 ~2 is the integral sum of the function (F(l/4))~1z~3/4e~z with step s2/2. Therefore, by choosing yo small enough and y\ and n large enough, this sum can be made arbitrarily close to 1, and the sum for remaining values of m can be made arbitrarily small. Thus Now it follows from A.8.27) and A.8.28) that @ 2 n,T,m & —3/4 — -JT = ^J^y 3/ e un,T uniformly in m such that yo < y < yi and that outside this domain, This completes the proof of the theorem. ¦ When we substitute the exact expressions for A, and fn, we obtain for i = 1,2,3, a% = Cinl^~E ' e" . (l+o(l)), A-8.29) where C\ = e3/4, C2 = e~1/4, and C3 = e~3l4. It is easy to confirm that if
1.8 Graphs with components of two types 81 e = 1 -277/1 -» 0, then n!(l - 2T 2^AHA -e)"v/27rn Thus, under the conditions of Theorem 1.8.5, the asymptotic formulas r M2r (i) W" ,-, , *n, i = 1,2,3, 2TT] ^ are valid. Let Knj denote the number of unicyclic components in a random graph from An,r a°d use f}n>T to denote the number of vertices in the maximal unicyclic component. Theorem 1.8.6. Ifn, T —> oo such that ? = 1 — 27"/n —> 0 and s3n -> oo, then for any fixed x, P{Kn,T + - l0g? < Xyj-~ logs Proof. For any fixed x, *(*) = 1 <2tx I oo - log e < ^y - 2 00 i m=0 •>n,T =m}P \Km + Tl0g? <xJ--\0g? where xm is the number of components in a random graph from Um discussed in Section 1.7. By Theorem 1.7.2, the random variable m is asymptotically normal with parameters @, 1). Let y = ?2m/2 and 0 < yo < y < y\ < oo. Then logm = logBjF) — 2logs. Further, since e —> 0, uniformly in m such that ye [yo, y\] and does not depend on m asymptotically. In view of Theorem 1.8.5, by choosing yo small enough and y\ and n large enough, the sum
82 The generalized scheme of allocation and the components of random graphs can be made arbitrarily close to 1. Therefore for any fixed*. ¦ Consider now the maximum size of the unicyclic components. Recall that in Section 1.7 we introduced Ws(z, y), setting Wo(z, y) = 1, and dx\ ¦ ¦ dxs Ws(z,y)= f where Xs(z,y) = {xj >y, i = 1, ..., s, x\ H \-xs <z], s = 1,2 Theorem 1.8.7. Ifn, T -> oo such that ? = 1 — 2T/n -> 0 andean -> oo, then for any fixed y > 0, oo 5=0 where Proof. For any fixed y > 0, oo ~ >{conj=m}P{e2f3m <y}, m=0 where ^SOT is the maximum size of the components in a random graph from Un studied in Section 1.7. If y = s2m/2 and y e [yo, y\\, then By Theorem 1.7.3, s —\j It is clear that this holds uniformly in m such that y e [yo, y\\. Choosing a small enough yo and a large enough y i and averaging over the distribution of conj prove Theorem 1.8.7. ¦
1.8 Graphs with components of two types 83 The number of trees in any graph of Anj is N = n — T. Let r\n_ j be the maximum size of trees in a random graph from Anj. Theorem 1.8.8. Ifn,T-> oo such thate = 1 -ITIn -» Oands3n -» oo, then where ft = — logFte °), 9 = 2T/n, and u is the root of the equation mi/2 su. A.8.30) \n / Proof. It is clear that oo m T m — u < z\ ei83n m=0 Let v = s3n. It is easily seen that, under the conditions of Theorem 1.8.8, the root of equation A.8.30) can be written as -o(l). A.8.32) Let y = s2m/2 lie in a finite interval 0 < yo < y < y\ < oo. Set 2G--hi) , 2G--hi) n — m n — m 3 vm = ?^mn, film) = -\og@meOm). Since sm = e(l + o(l)), it follows from A.8.32) that the root of the equation can be written as um = loguOT - §logloguOT -log4^ + 0A) = u + uniformly in y in any fixed interval [yo, jki ] • Therefore, by applying Theorem 1.6.3, we obtain ,-e~z -m,T-m ~ U < z} -+ e~e A.8.33) uniformly in y e [yo, y\]. In the main part of the sum in A.8.31), this probabil- probability does not depend on m asymptotically. Therefore, averaging A.8.33) over the distribution of conj proves Theorem 1.8.8. ¦ When we compare Theorems 1.8.7 and 1.8.8, we see that the maximum size of trees in a random graph from Anj is greater than the maximum size of the unicyclic components, since ft = ?2/2(l + o(l)) and u -> oo. Let anj be the
84 The generalized scheme of allocation and the components of random graphs maximum size of components of a random graph from Anj, that is, unj = max($,(.7\ rinj). Averaging over the distribution of coiut gives the following theorem. Theorem 1.8.9. Ifn, T -» oo such that s = 1 —2T/n -» Oands3n -» oo, then for any fixed z, where fi = — \og(Qe~°), 9 = 2T/n, and u is the root of the equation 1/2 To conclude this section, we consider the case where n, T —> oo such that e3n tends to a constant. Theorem 1.8.10. If n,T -> oo such that snl/3 -> 2 • 3/3u, where s = 1 — IT /n and v is a constant, then for any i = 1, 2, 3, w/iere C3 = 2V2~r(l/4)' 2V2r(l/4)' 2V2r(l/4)' /•oo piv) = / y-3/4p(-v - y; 3/2, -\)dy, Jo and p(u; 3/2, -1) is the density of the stable law defined by A.4.18). Proof. We again use an,T = 2_^ I JumFn-m,N- A.8.34) According to Theorem 1.7.1, as m -> oo, um = Amm~l^{\ +o(l)), A.8.35) where the value of the coefficient A depends on the type of the unicyclic compo- components in An,r, and V2^ _ y^ _ 1 21/4r(i/4)' 2~ 21/4r(i/4)' 3~
1.8 Graphs with components of two types 85 To estimate Fn-m,Ni we use formula A.4.25) with 0=1. Then ^'^-n-m). A.8.36) where ?# = ?i + • • • + ?w is a sum of independent random variables with distri- distribution A.4.19): 2**-V» , k=l,2 k\ By Theorem 1.4.2, bN2/3P{t;N = k} = p(u; 3/2,- uniformly in k such that u = (k — 2N)/{bN2/3) lies in any fixed finite interval. Under the conditions of Theorem 1.8.9, (n-2N)/(bN2/3) -> -v. Let y = m/{bN2!3) and 0 < yo < y < y\ < oo. Then, under the conditions of the theorem, _(n-m - 2N) Thus, by A.8.36), (n-m)lp(-v-y; 3/2,-1) uniformly in m such that jk e [yo, y\\- Since Z) = 2B/3J^3, from A.8.35) and A.8.37), we obtain (n\ r \m) An\mm-llAp{-v-y; 3/2,-1) m\2NN\e-n+mbN2/3 ti + olU; -3/4^(_u - j,; 3/2, - uniformly in m such that jf € [jFo, ^l]- To obtain anj, we need to carry out the summation in A.8.35). If we choose a small enough yo and a large enough y\, substitute the expression of anjtm into A.8.34), note that the obtained sum is the integral sum of the function z~3^4p(—v — y; 3/2, —1) with step b~xn~2/3, and omit the needed estimation of the tails, we have anj = -——= / y-3/4p(-v-y;3/2,-l)dy(l+o(l)), 2N N\ ¦JN Jo
86 The generalized scheme of allocation and the components of random graphs where As/2, c = Recall our convention that if we consider the set AnT, then A is replaced by It follows from Theorem 1.8.10 that the number conj of the vertices that form the unicyclic components in a random graph of Anj has the following limit distribution: If n, T -> oo such that e = 1 - 2T/n -> 0 and s3n -> v, then bN2'3P{con,T =m} = ~^-y-3/4p(-v - y; 3/2, -1)A + o(l)) piv) uniformly in m such that y = m/(bN2^3) lies in any fixed interval of the form 0 < yo < y < y\ <oo and p(v) is defined in Theorem 1.8.10. 1.9. Notes and references In this book, we use a probabilistic approach to combinatorial problems. Section 1.1 provides the results from probability theory that suffice for the probabilistic analy- analysis presented in the book. All of the results in Section 1.1 can be found in standard treatments of probability theory; however, we follow [76], where these results are given along with full proofs. A detailed discussion of the saddle-point method can be found in [42]. The- Theorem 1.1.7 is a simplified version of the corresponding theorem that gives a full asymptotic expansion of G (X). The proof of the local limit theorem (Theorem 1.1.11) was suggested by B. V. Gnedenko and is contained in the book [49], which remains one of the best textbooks on the limit theorems of probability theory (see also [43, 122, 60]). The approximation of the binomial distribution by the normal and Poisson laws was investigated by Yu. V. Prokhorov [125] (see also [90]). The inequality from Theorem 1.1.16 was proposed by Hoeffding [59] for sums of bounded random variables (see also [122]). Section 1.2 is devoted to a description of the generalized scheme of allocation of particles, which is a generalization of the multinomial trials. It was introduced in [69] and now has a significant place in probabilistic combinatorics (see also [78]). Successful applications of the generalized scheme are mostly limited to the equi- probable cases; there are only a few examples where a nonequiprobable scheme has a natural combinatorial interpretation. Along with the nonequiprobable multi- multinomial distribution, Example 1.2.3 is an example of a nonequiprobable scheme. Example 1.2.4 concerns random forests with rooted trees and is related to branching processes. Indeed, the distribution A.2.11) is that of the total progeny
1.9 Notes and references 87 in the Galton-Watson process ti(t, G), which begins with one particle that has Poisson-distributed numbers of offspring of a particle. Therefore a random forest with N trees and n nonroot vertices can be represented by the same process that begins with N particles under the condition that the total progeny is n + N. We describe more precisely the correspondence between random trees and the branch- branching process //,(?, G), whose distribution of the number of offspring of one particle is the Poisson distribution with parameter X. Let nr(t,G) be the number of particles at time t having exactly r direct de- descendants, and let v(G) be the total progeny over the whole period of evolution of the process. Consider the set Tn of all rooted trees whose nonroot vertices are labeled 1, 2,..., n, and whose root is labeled by 0. Assigning the probability (n + l)~n+1 to each tree of Tn gives the uniform distribution onTn. Any vertex of a tree is joined to the root by a unique path, whose number of edges is called the height of the corresponding vertex. We assume that all the edges of a tree are directed from the root and call the number of edges emanating from a vertex the degree of the vertex. Let Hr(t, Tn), r, t = 0, 1,..., n, be the number of vertices of height t having degree r. Consider the matrices \\fJLr(t, Tn)\\ and \\iJLr(t, G)\\, t,r = 0, 1,..., n, and a matrix M = ||mr(OII of the same dimension with nonnegative elements. Kolchin [73] showed that P{||Mf, Tn)\\ =M} = P{\\fir(t, G)\\ = M | y(G) = n + 1}. This relation means that the distribution of any random variable that can be ex- expressed in terms of the random variables ixr{t, Tn), r, t = 0, 1,..., n, coincides with the conditional distribution of the corresponding random characteristic of the branching process under the condition that v(G) = n + 1. This scheme has been used widely to obtain a complete description of the prop- properties of random trees and forests [73, 74, 75, 111, 112, 113, 114, 116]. Recently Yu. L. Pavlov [118, 119] discovered that the branching process that has a geo- geometric distribution of the number of offsprings corresponds - in the same sense as discussed above - to a random plane planted tree with unlabeled vertices. This representation of random plane planted trees is also mentioned in [4, 136, 138]. Note that we are aware of only these two branching processes that have the Poisson and the geometric distributions of the number of offspring, which lead to sets of trees with uniform distribution. Results on more general classes of forests with nonuniform distributions can be found in [120, 121]. The correspondence between random plane planted trees and a branching pro- process that has a geometric distribution appears to be deep and can be considered as a correspondence of realizations, that is, there exists a one-to-one correspon- correspondence between the set of such trees and the realizations of the corresponding
88 The generalized scheme of allocation and the components of random graphs branching process. It seems that this fact was first pointed out in an explicit form byV. A. Vatutin[138J. The general approach to investigating connectivity and the sizes of components of random graphs of various types is presented in Section 1.3. This general ap- approach was first outlined by Kolchin [78J, but its particular forms had already been used to investigate other random graphs, such as random permutations, random mappings, and random forests of rooted trees [71, 72, 73, 74, 75]. Forests of nonrooted trees are investigated in Sections 1.4-1.6. Section 1.4 concerns the number of such forests. The number of forests of N labeled rooted trees with n nonroot vertices is N(N + n)n~l. In contrast to the forests of rooted trees, the number Fn ^ of nonrooted forests cannot be expressed by a simple for- formula. A complete analysis of the random forests of nonrooted trees was conducted by V. E. Britikov, who used the generalized scheme of allocation. The possibil- possibility of using such an approach was pointed out in [78, 77]. When Britikov began investigating Fn^, it was known only that for any fixed N as n —> oo, A complete description of the asymptotic behavior of Fn^ can be found in [29]. In particular, formula A.9.1) is generalized for N -» oo and proves that if n -» oo and A - 2T/nKn -> -oo, then The cases in which A — 2T/nKn tends to a constant and A — 2T/nKn -» oo are covered by Theorems 1.4.4 and 1.4.3, respectively. Section 1.5 deals with the numbers \xr of trees with r vertices, r = 3, 4, ..., in a random forest. A complete description of the limit distributions of these random variables was obtained by Britikov [30]. Theorems 1.5.1 and 1.5.2 summarize the results proved in [30], where, in addition, the behavior of \x\ and \i2 is analyzed. The general approach used to investigate the order statistics in the generalized scheme was suggested in [70] and is also described in Lemma 1.2.2 in [78]. In Sec- Section 1.6, we apply this approach to the maximum size of trees in random unrooted forests. The results of this section were obtained by Britikov [28]. Theorems 1.6.1- 1.6.5 cover all possible regular variations of the parameters n and N, but not the case where N is bounded. Clearly, for any fixed k, the size of the &th largest tree of the forest can be analyzed in the same way. Luczak and Pittel [101] realized this posibility and interpreted the results of their analysis as an evolution of a random forest (see also [31]). It is pertinent to note here the results that concern the investigations of the ordered series of components of wide classes of random graphs [4, 7, 14, 15, 35, 36,41, 56]. There are two natural ways of labeling the components. One way is to
1.9 Notes and references 89 arrange them in decreasing order; the other is to use a particular random labeling called the size-biased permutation. For the first type of labeling, let M\ > A/2 > • • • be the sequence of sizes of the components of a graph with n vertices numbered in decreasing order. Let C\ be the size of the component that contains the vertex with label 1, let C2 be the size of the component that contains the vertex with the smallest label among the vertices not included in the first component, and so on. It is clear that the joint distribution of the random variables C\, C2, ¦ ¦ • nor- normalized by n places unit mass on the set A of infinite sequences of nonnegative numbers such that A = {(x\,X2, ...),x\+x2-\ = 1}, and the joint distribution of M\, M2, ¦ ¦ ¦ normalized by n is concentrated on the set V = {(xi,X2,...) e A, xx >x2 > •••}• For some classes of graphs, the limit distributions of the sequences C\, C2, ¦ ¦ ¦ and M\, M2,... are known. Let us describe a class of the limit distributions. Let Z\, Z2, ¦ ¦ . be independent identically distributed random variables with density 9A -z)e~\ 0<z<l, 9>0. Let YX = ZU Y2 = Z2(l - Zi), 73 = Z3(l - Zi)(l - Z2), • • • and let 7(i), 7B),... be the order statistics constructed from Y\, Y2, The dis- distribution of Y\, Y2,... on A is called the GEM distribution with parameter 0, and the distribution of 7(i), 7B),... on V is called the Poisson-Dirichlet distribution with parameter 0. It is known that the distribution of the random variables M\, M2,. •. normalized by n for the cycle sizes of a random permutation of degree n converges, as n —> 00, to the Poisson-Dirichlet distribution with parameter 0 = 1 and that the random variable C\ is uniformly distributed on the set {1,..., n) (see, for example [78]). For random mappings, the distributions of the random variables C\, C2, ¦ ¦ ¦ and M\, M2, ¦ ¦ ¦ normalized by n converge, respectively, to the GEM distribution and the Poisson-Dirichlet distribution with parameter 6 = 1/2 [3]. As usual, let ar denote the number of components of size r of a random graph with n vertices. The joint distribution of the random variables a\,... ,an of the form \-an =a\, ...,«„= an\ = where a\,..., an are nonnegative integers such that a\ + 2a2 H + nan = n is
90 The generalized scheme of allocation and the components of random graphs similar to the joint distribution of the random variables a.\, ..., an for a random permutation (see Lemma 1.3.7). This distribution arises frequently in population genetics and is known as the Ewens distribution [40, 67J. If the random variables C\, C2, ¦ ¦. and M\, Mi,... correspond to a graph with the Ewens distribution of a\,..., an with parameter 9, then as n —> 00, the distributions of the normalized random variables converge, respectively, to the GEM distribution and the Poisson-Dirichlet distribution with the same parameter 6 [67]. See also [139, 140, 141]. Section 1.7 contains the results on unicyclic random graphs obtained in [77]. The analysis of random graphs with components of two types presented in Section 1.8 is also contained in [77]. The idea of considering a graph as a combination of connected components of certain types can be attributed to Agadzhanyan [1, 2]. The results of Section 1.8 can be found in [77].
Evolution of random graphs 2.1. Subcritical graphs This chapter deals with several models of random graphs with n labeled vertices and T edges as n, T -» oo. The parameter 6 = 2T/n plays a decisive role in the behavior of random graphs, and it may be interpreted as time in the evolution of the graphs. It turns out that many of the characteristics change their behavior abruptly near the point 6 = 1. It is convenient to distinguish three domains of the variation of the parameter 9. We say that a random graph is subcritical if n, T —> oo in such a way that A — 6Kn -» oo. Thus, for a subcritical graph, 6 may tend to unity, but not too fast. A critical graph is characterized by the conditions that n, T -» oo and A — 6Kn tends to a constant. And, finally, a graph is supercritical if n, T —> oo and A - 6Kn -> -oo. In this section we consider three sets of graphs. Let Q^ \ be the set of all graphs with n labeled vertices and T edges with loops and multiple edges, provided each vertex may have no more than one loop and each pair of vertices may be connected B) by no more than two edges. Let Qyn T be the set of all graphs with n labeled vertices and T edges that have no loops; however, each edge may occur twice, so that each pair of vertices may be connected by no more than two edges. And, finally, let Qn \ be the set of all graphs with n labeled vertices and T edges that have neither loops nor multiple edges. Denote the number of graphs in Q^T by g^\,i = 1, 2, 3. We introduce the uni- uniform distribution on Q^\, i = 1, 2, 3, assigning equal probabilities to all elements of the corresponding set, and denote by G{rlT a random graph such that for any G eG^J = 1,2,3. 91
92 Evolution of random graphs Recall that in Section 1.8 we considered the sets A(^\, i = 1, 2, 3, of all graphs with n labeled vertices and T edges with components of two types: trees and C) unicyclic components. In An T, the unicyclic components have neither loops nor multiple edges; in A)^T, the unicyclic components have no loops, but may contain cycles of length 2; and in An T, the unicyclic components may contain loops and cycles of length 2. Thus, A(i) r G(i) i — I 2 3 The results of Section 1.8 allow us to describe the limit distributions of various characteristics of subcritical random graphs G^T, i = 1, 2, 3. Theorem 2.1.1. Ifn,T ->¦ oo such that A — 2T/nKn ->¦ oo, then for any i = 1,2,3, Proof. It is clear that We need to determine the asymptotics of g^T, i = 1, 2, 3, under the conditions of Theorem 2.1.1 to match the results on a^\ from Section 1.8. Recall that if 6 = 2T/n -> A., 0 < A. < 1, then by Theorems 1.8.1, 1.8.2, and assertion A.8.29), for any / = 1, 2, 3, where If n, 7 ^ oo and 73/«4 -> 0, then C) /n(n - D/2\ ^.r ^ r ; x _ 4 \ / _ 2G - ( 1)) " \ n(n — 1)/ \ n{n — n2Te-T/n-T2/n2 oTt\ \l~tv\ijj, B-1.2) and Theorem 2.1.1 is proved for i = 3. It is clear that each graph from Q^\ can be obtained by a choice of T edges, which is equivalent to an allocation of T particles into B) cells, provided each cell
2.1 Subcritical graphs 93 contains no more than two particles. Therefore &- ? (s)(s~"\ where S = B), t\ cells have exactly one particle, and tj cells have two particles. Hence, B) v^ S\ gnJ ^ tx\t2\(S-tx -t2y. ti+2t2=T T\S\ For any fixed t, T\S\ (T - 2t)\ (S - T + t)\ Therefore, under the conditions of Theorem 2.1.1, ,-,7 -T2/n2 °° S(nJ = ^ nlT p-T/n-T2/n2 B.1.3) Similarly, each graph from Qn T can be obtained by a choice of T edges, which is equivalent to an allocation of T particles into n + B) cells, provided that no more than two particles are allocated into each of B) cells and only one particle may be put into each of n cells. Therefore, putting S = B) yields By the same arguments under the conditions of Theorem 2.1.1, Then, by comparing B.1.1) to B.1.2), B.1.3), and B.1.4), we obtain the assertion of the theorem. ¦
94 Evolution of random graphs According to Theorem 2.1.1, each of the subcritical graphs Gn r, i — 1, 2, 3, consists of trees and unicyclic components and, with probability tending to 1, does not contain more complicated components. Given a random graph G, denote by nr(G) the number of trees of size r, by rj(G) the maximum size of trees, by co(G) the total number of vertices in the unicyclic components, by x{G) the number of unicyclic components, by /3(G) the maximum size of the unicyclic components, and by a(G) the maximum size of the components. Let y(G^T) be a characteristic of the random graph G^T and let yffT be the corresponding characteristic of the random graph from A^T. Then, by the formula of total probability, PM<j) < x) = PjG.« e Afj)P\y^ < x) + PI Oj.% <t <V Ip for any x. By Theorem 2.1.1, if the graph G^\ is subcritical. Therefore, for any characteristic y(G\\ T) of the subcritical graph, P[y(G™t)<x} = P{y^T <jc}A+oA)) + oA), B.1.5) and if Piy^j < x] tends to a limit, then the probability P{y{G^T) < x) has the same limit. Thus, many of the results of Section 1.8 can be reformulated for the corresponding characteristics of the random graphs G^\, i = 1, 2, 3. \^\ is an integer-valued characteristic, then for any fixed integer k, inj) ) {nJ V "(I), B-1-6) and if P{y^T = k] has a nonzero limit, then relation B.1.6) allows us to obtain the limit of the probability P{y {G^T) = k}. Theorem 2.1.2. Ifn, T -> oo such that T/n -+ 0, then for any i = 1, 2, 3, Ifn,T ->¦ oo such that e = 1 — 2T/n ->¦ 0 and ?3n ->¦ oo, then for any fixed x > 0 and any i = 1, 2, 3, Proof. The assertions of the theorem follow from B.1.5), B.1.6), and Theo- Theorems 1.8.1 and 1.8.5. ¦
2.1 Subcritical graphs 95 Theorem 2.1.3. If the graph G^\ is subcritical, i = 1, 2, 3, andr = r(n,T) > 3 varies such that NprF) -> oo, then for any faced x, - Npr@) where N = n-T, 0 = IT jn, < x — (' Pr(d)(l - Pr(d) -{IX- kJPr(d) Orr\p) = -z B-0)' 2 cr = Ifr = r(n,T)>3 varies such that Npr{d) —>• A., 0 < A. < oo, then for any fixed k = 0,1,..., Proof. In view of B.1.5) and B.1.6), the assertion of the theorem follows from Theorems 1.5.1 and 1.5.2 because, by Theorem 2.1.2, the number co(G^T) of vertices in the unicyclic components for subcritical graphs is small compared with the total number of vertices; more precisely, P{co(Glrt T) < n2/3} —>• 1. ¦ Theorem 2.1.4. Ifn,T —>• oo such that T/n —>• 0, r = r(n,T) > 1 and Npr{6) ->¦ oo, Npr+i(d) ->¦ A, 0 < A < oo, then for any i = 1, 2, 3, Proof. In view of B.1.5) and B.1.6), the assertions of the theorem follow from Theorem 1.6.1. ¦ Theorem 2.1.5. If i = 1,2,3 and n,T -+ oo such that 0 = 2T/n -+ A, 0 < A < 1, then for any faed k = 0, 1,..., _ Ake~Al
96 Evolution of random graphs where A| = A2 = 1 2 ? 1 ~2l0^ X ;A -A.) H 1 Kl-X)--H 1 4 X2 1 X X2 A3 = --log(l-X)----. For any fixed k = 0, ± 1,..., = exp { - where _ logn - E/2) log log n 0 - 1 - log 0 ' [a] and {a} are, respectively, the integer and fractional parts of a. Proof. The assertions of the theorem follow from B.1.5), B.1.6), and Theo- Theorems 1.8.4 and 1.6.2. ¦ Theorem 2.1.6. Ifi = 1, 2, 3 and n, T -+ oo such that ? = 1 - 2T/n -> 0 and ?3n —>• oo, then for any fixed x, and for any fixed x > 0, oo is defined in Theorem 1.8.7. Finally, for any fixed z, - u < log^e*), 0 = 27/n, anJ w w the root of the equation /o\ 1/2 ^ u. B.1.7) Proof. The results of the theorem are the consequences of B.1.5), B.1.6), and Theorems 1.8.6, 1.8.7, 1.8.8, and 1.8.9. ¦
2.2 Critical graphs 97 2.2. Critical graphs Recall that a graph with n vertices and T edges is called critical if n, T —>• oo such that e = 1 - 2T/n -+ 0 and ?3« tends to a constant. We have seen that many of the characteristics of the random graphs G^\, i = 1, 2, 3, change their behavior if 6 = IT In approaches the value 1. For example, the number of cycles, or the number of unicyclic components x{G^\), tends to zero in probability if 6 ->¦ 0, has the Poisson distribution with parameter A,-, z" = 1, 2, 3, respectively, if 6 —>• A., 0 < A. < 1, where A, A2 A3 1 = — log(l — A) + 2 1 = — log(l — A) — 2 = -Ilog(l-X)- A 2 A 2 A 2 A2 + 4 ' A2 + 4 ' A2 4' and is asymptotically normal with parameters (— 5 logs, — 5 logs) if e ->¦ 0, ?3n —>• 00. Thus, 6 = 1 is a singular point and one can correctly suppose that the behavior of the graphs near this point is interesting but difficult to investigate. Indeed, not much is known about the properties of critical graphs. We present here only one assertion about this behavior. Recall that An T is the set of graphs with n labeled vertices and T edges that consists of trees and unicyclic components with neither loops nor multiple edges for i = 3, without loops and with cycles of length 2 allowed for i = 2, and with cycles of lengths 1 and 2 allowed for i = 1. Theorem 2.2.1. Ifn,T^* 00 such that ?n1^3 ->¦ 2 • 3~2/3u, where v is a con- constant, then for any random graph Gr*T, i = 1, 2, 3, /•OO p(v) = / ^~3/4p(-i; - y Jo and p(y; 3/2, —1) is the density of the stable law, introduced in Theorem 1.4.2, with the characteristic function Proof. It is clear that PlG(i) G A{i) \ - a{i) H^ fc ^] anj
98 Evolution of random graphs where a^\ is the number of graphs in A^T, and g^T is the number of graphs in C?(i) / = 1, 2, 3. In accordance with Theorem 1.8.10, where N = n — T, -1/4 V&-3/4 C\ = 7= , C2 = ;= , C\ = In the previous section, we proved that @ = n2Ta(l) gn,T 2TT\ A+0A)), where ci(l) = e3/4, c2(l) = e~1/4, c3(l) = e~3/4. Since 7 = n(l — e)/2 and ?3n —>• 8u3/9, we easily find and, consequently, The function p(u) can be represented by a convergent power series. The function /•OO = p(-v)= / (v -y; 3/2, -1) dy gi(y) = can be thought of as the convolution of the function y~3/\ y>0, 0, y<0 and the function g2(y) = p(y; 3/2, -1), so that /•OO g(v) = / g\(y)gl(v - -OO Therefore the Fourier transform g(t) of the function g(v) is the product of the Fourier transforms of the functions g\ (y) and g2(y). The Fourier transform g\ it) of the function g\ (y) is gi(t) =
2.2 Critical graphs 99 and the Fourier transform gj{t) of the function gj(y) = p(y\ 3/2, —1) is the characteristic function of this density: Thus, By the inversion formula, '00 ,—itv Xi 1 f° g(v) = — / e-ltvg(t)dt ¦^TT J—oo = f e\t\e V2rC/4) y-oo and therefore, under the hypotheses of Theorem 2.2.1, v^r(i/4)V2rC/4) where h(v) = —OO Since T A/4) T C/4) = V2tt, we obtain The function /z (u) can be represented by a convergent power series. Theorem 2.2.2. Ifn,T —>• oo such that en1/3 —>• 2 • 3~2/3u, where v is a con- constant, then for any random graph Gnr, i = 1, 2, 3, AC — (J Proof. Let us represent h{v) by a power series in v. Since the left-hand side of B.2.1) is real, J—oo
100 Evolution of random graphs Consider first the integral roo h[(v) = / e"vrl'*ein'*exp{-t3/2el"/4\dt. Jo By expanding e'tv, we obtain h, (iO = e1*'* f] ^ r tk~[/4 exp j - W 4}</'- k=0 After the change of variables t3^2ein^4 = z, we obtain Therefore Similarly, for i-c -r Jo we obtain The assertion of the theorem follows from B.2.1), B.2.2), and B.2.3). ¦ Theorem 2.2.2 allows us to calculate the limit values of P{G^ T e An T}. For example, = ^273. Some values of P(v) are given in Table 2.1. 2.3. Random graphs with independent edges When we were determining the number of graphs in the classes Q^TJ = 1, 2, 3, in Section 2.1, we associated each of the classes with the corresponding equiprobable scheme of allocating particles into cells. It is easily seen from these correspon- correspondences that the realizations of each of the random graphs G^\, i = 1, 2, 3, could be obtained by a sequential allocation of particles, but these random allocations are dependent. For example, if a pair of vertices has been connected in the random
2.3 Random graphs with independent edges Table 2.1. Values of P(v) P(v) 3.0 2.8 2.6 2.4 2.2 2.0 1.8 1.6 1.4 1.2 0.0053 0.0118 0.0239 0.0443 0.0755 0.1196 0.1768 0.2461 0.3244 0.4078 -1.0 -0.8 -0.6 -0.4 -0.2 0.2 0.4 0.6 0.8 1.0 0.4919 0.5727 0.6470 0.7128 0.7693 0.8551 0.8860 0.9105 0.9297 0.9447 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 2.8 3.0 0.9563 0.9653 0.9722 0.9776 0.9819 0.9852 0.9878 0.9899 0.9915 0.9929 101 C) graph G^ 'T after allocating some of the edges, then the outcomes of all subsequent allocations cannot be the edges connecting these two vertices. The classes of random graphs whose edges are independent seem to be easier to investigate by using the methods of probability theory. The best-known random graph with this property is Gn,p with n vertices such that each of the B) possible edges belongs to the edge set of Gnp with probability p independently of the behavior of the other edges. This graph has a random number of edges with the binomial distribution with n trials and the probability of success p. In this section, we consider the random graph Gnj with n vertices labeled 1,..., n and T edges that can be obtained by T independent trials. In each trial, the loop at any point i occurs with probability n~2 and the edge connecting the vertices i and j, i ^ j, occurs with probability 2n~2. In other words, if the edge set of Gnj consists of T edges ((/(I), y(l)),..., (i(T), j(T)), then i(l), j(l),..., i(T), j(T) are independent identically distributed random vari- variables taking the values 1, 2,..., n with equal probabilities. It is clear that the re- realizations of the random graph Gnj are not equiprobable. For example, for n = 2 and T = 1, the graphs with a loop and an isolated vertex have the probabilities 1/4 each, and the connected graph has the probability 1/2. Nevertheless, this model has some advantages and is conducive to treatment by probabilistic methods. Since /(I), /(I),..., i(T), j{T) are independent identically distributed ran- random variables, we can associate to the random graph Gnj the classical scheme of allocating particles where IT particles are allocated into n cells such that each particle falls into any of n cells with probability \/n independently of the allo- allocations of the other particles. By using this relationship, we can, for example, easily find the distribution of the number of loops in Gnj. Indeed, we have T trials, corresponding to T edges, and in each of these trials a loop appears with
102 Evolution of random graphs probability l/n. Thus, the total number of loops a\ in Gn j has the binomial dis- distribution with parameters (T, \/n). The mean number of loops is Ect\ = T/n. If 2T/n -> A., 0 < A. < oo, then the Poisson distribution with parameter A./2 is the limit distribution for ct\. Under the condition ct\ = m, the other edges may be considered as the result of T — m independent allocations into Q) cells corresponding to B) possible edges of the complete graph with n vertices. Therefore, with a\ = m, the number a.2 of cycles of length 2 in Gnj can be thought of as the number of cells with exactly two particles in the classical (equiprobable) scheme of allocation ofT — m particles into Q) cells. The classical scheme of allocation has been well studied. In particular, if n, T -> 00 such that ITJn -> A., 0 < A. < 00, then the distribution of the number of cells, occupied by exactly two particles each, converges to the Poisson distribution with parameter A,2/4. Since the limit distribution does not depend on m form = o{n), averaging over the distribution of ct\ shows that ct\ and «2 are asymptotically independent and their distributions approach the Poisson distributions. Theorem 2.3.1. If n, T -> 00 such that 2T/n -> A., 0 < A. < 00, then for any fixed nonnegative integers k\ and k2, P{*x =klt«2= k2} = Q)*' {^2 e-W-^l + o(l)). Because the edges of Gnj are independent, we can apply direct probabilistic approaches to investigations of the structure of Gnj. Theorem 2.3.2. Ifn, T -> 00 such that T/n -> 0, then in Gnj, with probability tending to 1, there are no cycles and all the components are trees. Proof. Denote the number of cycles of length r with r distinct vertices by ar, and let v(Gnj) = ct\ H \-ccn be the total number of cycles considered as induced subgraphs of Gnj- We can represent ar as a sum of indicators. The edges of Gnj appear sequentially in T trials. We assign the numbers 1, 2,..., T to the trials and arrange (in some order) all (^) possible subsets of cardinality r of the trial numbers. We define the random variable ?,• to be equal to 1 if the subset of trial numbers labeled with i forms a cycle in Gnj, and ?,• = 0 otherwise. It is clear that ur =?1 H h?Gy In turn, each of the random variables f 1,..., %,t\ can be represented as a sum of indicators. The cycle corresponding to the subset with label i can be constructed from r different vertices and r different edges. There exist (") possibilities to choose these r vertices and (r — 1) !/2 possibilities to construct a cycle from these r vertices for r > 3. Each construction fixes r edges that must occur. These r edges
2.3 Random graphs with independent edges 103 can occur at r fixed places of the subset labeled i, and there exist r\ possibilities to assign these r edges to r places. Thus the event {?,• = 1} can be realized by one of the (")(r - \)\r!/2 variants. For r > 3, each of these variants has the probability B/n2)r. Thus, B.3.1) It is not difficult to check that this formula is also valid for r = 1 and r = 2. It follows from B.3.1) that Trnr(r - \)\r\ / 2_Y _ /2TY 1 r\r\2 \n2) \ n ) 2r' Therefore, r=\ has the upper bound Under the conditions of the theorem, Ev(GMO-) tends to zero and the number of cycles in Gnj is zero with probability approaching 1. ¦ We denote by Anj the set of all graphs with n labeled vertices and T edges whose components are trees and unicyclic components. Note that loops and cycles of length 2 are permitted. As before, 0 = 2T/n, e = 1 - 2T/n. Theorem 2.3.3. Ifn, T -> oo such that ?3n -> oo, then 4 P{Gn,T ?An,T}< -^- Proof. We have to prove that under the conditions of the theorem, the graph Gnj has no component with more than one cycle with probability less than 4/(?3n). If in Gnj there exists such a component, then in Gnj there either exists a subgraph that consists of two cycles connected by a chain (pince-nez) or there exist two cycles that have a common sequence of edges (a cycle with a bridge). We use ^'J to denote the number of subgraphs of Gnj that consist of cycles of lengths r and s connected by a chain of t edges, and denote by ?r the number of subgraphs of Gnj that consist of a cycle of length r with two vertices connected by a sequence of t edges. To prove the assertion of the theorem, it is sufficient to show that the
104 Evolution of random graphs mean number of such subgraphs tends to zero. It is clear that P{GnJ <?AnJ} = f r,a,t rj r,s,t rj By reasoning in the same way as in the proof of formula B.3.1), we obtain the estimates \r + tj \n2) n \n 71 \ / 9 \ r+'s+' o /Tr\r+S+' X Thus, the mathematical expectation of the total number of pince-nez and cycles with a bridge can be estimated as follows: 00 00 \ r+s+t /2TV Theorem 2.3.4. Ifn,T -> 00 5mc^ ^af 0 = 27/n -> A., 0 < A. < 1, f/ze distribution of the number of cycles v{Gnj) in Gnj converges to the Poisson distribution with parameter Proof. In view of Theorems 2.3.1 and 2.3.3, we can reduce the proof to the application of Theorem 2.1.5 concerning the random graph Gn T without loops and multiple edges. Indeed, by the formula of total probability, P{v(Gn,T) = k} = J2 p{«i =ku a2= k2, GnJ e Anj) k\+k2<k x P{v(Gnj) = k I ai = ki, ec2 = k2, Gnj e Anj) + ^2 p{"i =k\, a2 =k2, Gn,T ? Anj} k\+k2<k x P{v(Gnj) = k\ai=ku ct2 = k2, GnJ ? Anj).
2.3 Random graphs with independent edges 105 According to Theorem 2.3.3, P{Gnj ? Anj} -> 0, and it is not difficult to see that P{GnJ eA\a\ =ku cc2 = k2) = P[v(Gn,T) = k\ak=k\, ec2= k2, Gnj e An Thus {v(GnJ) = k}= J2 Pf"i = *i. «2 = h) B.3.2) k\+k2<k According to Theorem 2.1.5, under the conditions of Theorem 2.3.4, for any fixed k\, k2 = 0, 1, ..., and k > k\ + k2, \k—k\—k2 —A3 where 2 2 4 Now it follows from B.3.2) and Theorem 2.3.1 that -ki -k2)\ k\ x g where X X2 1 A = A3 + - + — = -- By reasoning in the same way, we can reformulate the theorems proved for ^ 'T so that they can also be applied to subcritical and critical graphs Gnj ¦ As an
106 Evolution of random graphs example, we give an analogue of Theorem 2.1.6 on the number x(Gnj) and on the maximum sizes ^(G«,r). P(Gnj), and ct(Gnj) of trees, unicyclic components, and all components in Gnj, respectively. Theorem 2.3.5. Ifn, T -> oo such that e = 1 —IT In -> 0 andean -> oo, then for any fixed x, P x(GnJ) + - logs < xJ-- logs 2 ° ~ V 2 for any fixed x > 0, f J -oo 00 where Zs{x) is defined in Theorem 1.8.7; and a(GnfT) -u<z} = P{/3r}(GnJ) -u< where E = — log(#e~0), ^ = 2T/n, and u is the root of the equation /o\ 1/2 rij 2 For the same reasons, Theorem 2.2.2 can be extended to the critical graph Gnj ¦ Theorem 2.3.6. Ifn, T -> oo such that en1^ -> 2 • 3~2/3u, where v is a con- constant, then For the supercritical case where n, T -> oo such that ?3n -> — oo, we present here only the simplest results. In the final section of this chapter, we will give a short review of what is known about the supercritical graphs. It is known that if 6 = 2T/n -> A., A. > 1, a giant component appears in the graph Gn( 'T and, with probability tending to 1, G^ T consists of trees, unicyclic components, and this giant component formed by all the vertices that are not contained in trees and unicyclic components. As 2T/n increases, the size of the giant component increases and the number of unicyclic components decreases. If9 = 2T/n—>X,l <A.<oo, then the number of unicyclic components has a Poisson distribution. For 6 -> oo, we have the following result. Theorem 2.3.7. Ifn, T -> oo such that 6 = 2T/n -> oo, then with probability tending to 1, there are no unicyclic components in Gnj.
2.3 Random graphs with independent edges 107 Proof. The number of unicyclic component with r vertices is not greater than CT'-i/2) where c is a constant (see, e.g., [16]). Denote by xr(Gnj) the number of unicyclic components of size r in Gnj- By reasoning as in the proof of B.3.1), we find that Ex (G < c(n\( Vr~1/2r' (—\ (\ 2r(n~r) r(r ~ [) | ~ \rj\rj \n2) \ n2 n2 ) B.3.3) where the last factor is the probability that the T — r edges, which were not used for the construction of unicyclic components, neither connect the vertices in the component with the vertices outside the component nor connect any pair of vertices in the component. It is sufficient to prove that Exr(GnJ) -> 0. \<r<n With the help of estimate B.3.3), we find that Exr(Gn,T)< For sufficiently large n and 1 < r < n, e-2r(n-(r+l)/2)(T-r)/n2 and q = del~0/4 < 1. Therefore oo Since q = 6el~9^4 -> 0 as 9 -> oo, we conclude that a unicyclic component exists in Gnj with a probability that tends to zero. ¦ Finally, we consider the behavior of the random graph Gn, t near the point where the graph becomes connected. Denote the number of components in Gnj by xnj. Theorem 2.3.8. If n -> oo and IT = n logn + xn + o(n), where x is a con- constant, then with probability tending to 1, the graph consists of a giant connected component and isolated vertices. Also, for any fixed integer k = 0, 1,..., e~kx P{xnJ -!=*}-> —e~e .
108 Evolution of random graphs Proof. We have to prove that, with probability tending to 1, Gnj consists of one giant component and isolated vertices, and that the distribution of the number of these isolated vertices converges to the Poisson distribution with parameter e~x. The edges of G,,j appear as a result of T independent trials, and these T trials can be considered as the allocation of 27 particles into n cells such that any particle is allocated independently of the other and, with equal probabilities, falls into any of n cells. Therefore the number of isolated vertices in Gnj has the same distribution as the number ixqBT, n) of empty cells in the well-studied classical scheme of allocating particles. Under the conditions of the theorem, the distribution of tioBT, n) converges to the Poisson distribution with parameter e~~x. To complete the proof, it suffices to show that, with probability tending to 1, the remaining vertices form one giant component. If, in addition to the isolated vertices, there were two other components, then the graph would contain a tree of size r, 2 < r < n/2, such that any vertex of the tree would not be connected to any vertices outside the tree. A skeleton of one of the two components could play the role of such a tree. By %r we denote the number of trees of size r which are the skeletons of connected components of Gnj- We will show that under the conditions of the theorem, 2<r<n/2 and consequently, with probability tending to 1, such a tree does not occur in Gn j ¦ We can represent %r as a sum of indicators and find that v - 1/ vv \n J V n This formula is similar to B.3.1): We choose r vertices and r — 1 edges that form the tree, and the last factor is the probability that none of the T — r + 1 edges that remain connects a vertex from the set of r selected vertices with a vertex from the set of n — r remaining vertices. By using formula B.3.4), we can check, for example, that with probability tending to 1, there are no isolated edges in Gn j. Indeed, for r = 2, T-\ ~'~ 1}/, B.3.5) and the right-hand side of B.3.5) tends to zero if n -> oo and 27 = n logn + xn + o{n). It follows from B.3.4) that Ct < f ^^ r!y .-2r(n-r)(T-r+l)/n2
2.4 Nonequiprobable graphs 109 and for all sufficiently large n, -1 {2TY'1 Therefore 2 oo 2 3<r<n/2 r=3 ~ 27A - If n -> 00 and 2T = n log n + x« + o(«), then and for all sufficiently large n, 1-40/9 where c is a constant. Therefore, under the conditions of the theorem, Taking into account that E^2 -*• 0 also, we see that, with probability tending to 1, the graph Gnj has only one component besides the isolated vertices. ¦ 2.4. Nonequiprobable graphs The model of the random graph Gnj considered in the previous section can be easily extended to nonequiprobable graphs. However, the approach based on the generalized scheme of allocation, which reduces the investigations of equiprobable graphs to some problems concerning sums of independent random variables, does not apply to nonequiprobable graphs. In this case, few results have been obtained because of the lack of effective methods to investigate these objects. In this section, we consider a generalization of the random graph Gnj of the previous section. We preserve the notation Gnj for this nonequiprobable graph with n vertices labeled with the numbers 1,2,... ,n and T edges, which can be obtained by the following procedure. We consider T independent trials, in each of which one edge is drawn. The edge connects two different vertices or forms a loop;
110 Evolution of random graphs the vertices with labels / and /' are connected with the probability Ipipj, and the loop at vertex / is formed with the probability pj;i, j = 1, ..., n, p\, ..., pn > 0, px _)_...-)- pn = l. Thus, after T trials we have a realization of the random graph Gnj, which may have loops and multiple edges. The main result of this section is the following assertion. Theorem 2.4.1. Assume that pi = «,-/«, where ai = <2/(n), 0 < ? < a,- < E, i = 1, ... ,n, e and E are constants, and the limit 1 " aL = lim - > af /=1 exists. Then, ifn, T -> oo such that IT /n -> A., 0 < Xa2 < 1, the distribution of the number of cycles v(Gnj) in the graph Gnj converges to the Poisson distribution with parameter A = — ^ ln(l — Xa2). In proving the theorem, the limit distribution of the random variable ar, the number of cycles of length r, and the joint limit distribution of ar,, ..., ars are obtained. Theorem 2.4.2. Under the conditions of Theorem 2.4.1, without the requirement "ka < 1, the distribution of the random variable ccrfor any fixed r tends to the Poisson distribution with parameter Xr = Xra2r /Br). Theorem 2.4.3. Under the conditions of Theorem 2.4.1, without the requirement Xa < 1, the joint distribution o/ar,, ..., ars for any fixed 1 < r\ < ¦ ¦ ¦ < rs converges to the distribution of s independent random variables that have the Poisson distributions with parameters A.r,,..., Xrs, respectively. The proof will be accomplished by the method of moments. A cycle of length r has no self-intersections if it is composed of r vertices and exactly r edges of Gnj- Denote by ar the number of cycles without self- intersections of length r, r > 3, in the random graph Gnj. For r distinct vertices i\, ..., ir, let ?/,,...,/,. = 1 if in Gnj there exists a cycle composed of these r vertices containing exactly r edges of Gnj\ in other cases, we set ?/,,...,;,. = 0. Then where the summation is taken over all (^) distinct unordered sets of r distinct indices. In the complete graph with vertices i\,.. .,ir, there exist (r - l)!/2 distinct cycles containing exactly r edges. We label these cycles in an arbitrary order with the numbers j = 1,..., (r - l)!/2 and represent the random variable
2.4 Nonequiprobable graphs 11 1 ?/, /;. as the sum of indicators: (r-l)!/2 ft. .v= E C..V BA2) where ?;7) ;. = 1 if the y'th cycle exists in Gnj, and ?7. ;. = 0 otherwise. We now investigate the behavior of the random variable v(Gnj) = cc\ H \-an, where the variables ar are defined by B.4.1) for r > 3, ct\ is the number of loops, and «2 is the number of pairs of parallel edges in Gn j ¦ Each cycle in the graph Gnj may be thought of as the set of edges that form this cycle; therefore, the following assertion is needed for evaluating such probabilities as P{%;^ ir = 1}. Let Vr = {(z'i, j\),..., (ir, >)} be the set of r distinct pairs of vertices in the graph Gnj, where ik ^ Jk, k = 1,..., r. Denote by P(Vr) the probability of the event that all the edges from Vr occur in Gnj- Lemma 2.4.1. Ifn, T -> oo, 2T/n -> A., 0 < A. < oo, 0 < e < a; < E < oo, / = 1,..., n, then for arbitrary fixed e, E, and r, P{Vr) = X-ahah . ..airajr (l + O Q)) B.4.3) uniformly with respect to a\,... ,an and all sets Vr. Moreover, for any 8 > 0, there exists a constant c such that, for all r and n, ) P(Vr) < c(X^r8) ahaJx ¦ ¦¦airajr. B.4.4) Proof. Set qk = 2pikpjk, k = 1,..., r. Then T[m\-\ \-mr] mi! • • ¦ mri m\,...,mr>l qr)T~r '"q v (\ n, ,, \T—m\ mr i /I 4 o x yv — q\ — • • • — qr) j. \L.L*.j) Here x[m] = jc(jc — 1) • • • (x — m + 1); the summation in X]' is taken over all sets {mi,..., mr) in which mi,..., mr > 1 and there exists /, 1 < i < r, such that
112 Evolution of random graphs m; > 1. It is clear that A -qx qr)T'r < 1, and for an arbitrary fixed r, (l-qi qrf-r = 1 + O(\/n). B.4.6) In addition, (T — r\[m\-\ \-mr-r] ( Z r-'^-'a )r-'-"- B.4.7) where G _ r _ l)[m\-\ \-mr-r-l] , mx\---mr\ m\,...,mr>_\ X Let /,- = mi —2, lj = mj — 1, j ^ i (recall that m, > 1). Then = {T ~r ~ x tfj1 • • ¦ q[r{\ - qx qr)T-r-\-h-..-lr = L B 4 8) Now assertion B.4.3) follows from B.4.5)-B.4.8), and assertion B.4.4) from B.4.5), B.4.7), and B.4.8), since A 2TrE2 (r) , () iq\---qr < —j-ai,ah ¦ ¦ ¦ airajr.
2.4 Nonequiprobable graphs 113 Corollary 2.4.1. Ifn, T -> oo, 2T/n -> A, 0 < X < oo, 0 < e < a,¦ < E < oo, / = 1,...,«, then for arbitrary fixed e, E, X, and r, uniformly with respect to j, \ < j < (r—\)l/2, all sets {i\,..., ir}anda\,... ,an. Moreover, for any 8 > 0, there exists a constant c such that, for all r and n, 2 l Proof. The equality ^/.7/. = 1 holds if and only if in Gn j there exist r fixed edges, {(k\, j\),..., (kr, jr)}, kv ^ jv, v = 1,..., r, which form the yth cycle on the vertices i\,..., ir. For these edges, the sets {k\,..., kr] and {j\,..., jr) coincide with the set {i\,..., ir). Therefore, the corollary follows from Lemma 2.4.1. The notation {i\,..., ir) denotes an unordered set of distinct indices i\,..., ir; the number of such sets is ("). For ordered sets of distinct indices i\, ..., ir, we will use the notation (i\,..., ir); the number of such sets is n^r\ By the symbols we will denote the summations over all distinct unordered and ordered sets of r distinct indices, respectively. It is clear that the summation over all unordered sets {i\,..., ir} is well suited to summands fil...tr whose values are invariant with respect to the permutations of indices. For such summands, _ f«...ir=r\ and, moreover, /¦(I) ,-A) ,¦(*) Ak) = I. ...lr ...I, ...If if the left-hand side summation is taken over all distinct ordered sets of distinct r-dimensional indices i\ ,..., i? . Lemma 2.4.2. If 0 < e < ai < E < oo, i = I, ..., n, then for any fixed r, as n -^ oo, i>,2V= E <-AA + / V i = l
114 Evolution of random graphs Proof. The following representation is valid: i>?y= t «i-<= e <¦¦¦<+ ^ <¦¦¦<- i=\ I /|,...,/V=1 {'I i,) (i\ '/¦> where the summation in the first sum is taken over all distinct ordered sets of distinct indices, and in the asterisked sum, over all distinct ordered sets, each have at least two identical indices. The number of summands in the first sum is «^; the number of summands in the second sum is equal to nr — «^ and does not exceed crnr~l where the constant cr depends only on r. Therefore and the proof is complete. ¦ Corollary 2.4.2. Under the conditions of Theorem 2.4.2, for any fixed r > 3, kra2r E 2r Moreover, for any 8 > 0, there exists a constant c such that Ear < c 2r Proof. Using representations B.4.1) and B.4.2), with the aid of B.4.9), Corol- Corollary 2.4.1, and Lemma 2.4.2, we obtain (r-l)!r E«/- = ^ z (r — \)\Xr ^-^ 9 9 / / 1N ^ a? ---a,2 1 + 0 - 2/1^! X'a1' The second assertion follows immediately from the inequality of Corollary 2.4.1. We now evaluate the factorial moments of ar. If Sn = ?i + ••• + ?„, where ?!,...,?„ take the values 0 and 1 only, then according to Theorem 1.1.4, Sn(Sn-l)---(Sn-m + l)= J2 &,•••&„, B.4.12) (k\,...,km) where the summation is taken over all distinct ordered sets of m distinct indices.
2.4 Nonequiprobable graphs 115 In our case, the indices have a composite structure because (/-l)!/2 <*= E E C- {/'i iV) ./=• The following representation is analogous to B.4.12): ar(ar - 1) ¦ ¦ • («,-* + 1) = J2$W ,„ • • -^ ,m), B.4.13) /| •••/r Zj •••/r where the summation is taken over all distinct ordered sets of distinct indices of the form ({i\,..., ir}, j); the set {i\,..., /',-} in the index is considered an unordered set of distinct indices, and j indicates the number of the cycle formed by the vertices i\,... ,ir. We show that under the conditions of Theorem 2.4.2, for any fixed r and any fixed m > 1, /\r 2r\m ^^(^-j . B.4.14) This assertion for m = 1 follows from Corollary 2.4.2. In order to become accustomed to the more complicated notation, we first consider the case m = 2. By B.4.13), E Decompose the right-hand side sum into two sums. Let the first sum Ei include the summands with nonintersecting sets {i\ ',..., ij- } and {/j \ ..., rr '}. When we take into account that in this case 2r edges must exist to guarantee ,-A) ,-(l) ~ ^ B) .B) l\ "' 'r 1 r and by using Lemma 2.4.1, we obtain fcO"l) _ fc(/2) _ il ,A) ,(D "~ 5,-B) B) — L f Therefore f^ /('¦0!\2 2 2 2 X
116 Evolution of random graphs It is clear by virtue of B.4.9) and B.4.10) that Therefore, by virtue of Lemma 2.4.2, 2 / (r\J \n ^7 V 2r B.4.15) We now show that the remaining sum E2 tends to zero. The summation in ?2 is taken over the pairs of composite indices in which the sets {i[ \ ... ,ir ^ and {/j \ ..., ir2^} have at least one common element. Each composite index ({i\,..., ir}, j) corresponds to a cycle in the complete graph with n vertices; the cycle consists of r edges and the vertices i\,... ,ir- Two cycles corresponding to the indices ({i\ ,... ,ir }, j\) and ({/[ \ ..., ir }, 72) can have M < 2r distinct vertices and L distinct edges. We decompose the sum E2 into the sums Em,i containing summands with fixed values of the parameters M and L. The number of such sums does not exceed BrJ; therefore it is sufficient to prove that any sum ?m,z tends to zero. It is easy to see that in the case M < 2r, the inequality L > M + 1 is valid. The number of summands in the sum ^m,l does not exceed nM, and the probability that L fixed edges appear in Gnj does not exceed, by virtue of B.4.4), the value cn~L. This implies ^ml S—rzn <-¦ B.4.16) nL M n Therefore, as n -> 00, ?2 -> 0. B.4.17) The assertion B.4.14) for m = 2 follows from B.4.15) and B.4.17). Now let us consider the factorial moment of an arbitrary order m. By B.4.13), Ear = Ei + S2, where the sum Ei includes only summands that do not have a pair of sets from {*' 1 , ¦ ¦ •, ir },•••, {i™ , • ¦ ¦, ir } with common elements. In this case, rm edges must occur in the graph Gnjio guarantee that the corresponding random variables equal 1. From this and Lemma 2.4.1, it follows that tUm) _ 1} ,.(m) Am) — i f \ mr a ¦ ¦ ¦ -n2 ¦ ¦ ¦ n2 ¦ ¦ -n2 C1 "•(I) ".A) u.(m) u.(m)\l
2.4 Nonequiprobable graphs 117 and, by B.4.9) and Lemma 2.4.2, \r 2r\m J B.4.18) J It remains to prove that the sum E2 taken over the remaining sets of indices tends to zero. The summation in E2 is taken over m sets of composite indices that have at least one common element in at least one pair of the sets {i\p ,..., ir }, {/| ,..., if }, p ^ q. Recall that each composite index corresponds to a cycle in the complete graph with n vertices. The cycles corresponding to m indices can contain M distinct vertices and L distinct edges. We decompose the sum E2 into j I th ^ ii d ith fid l f th M d the sums ^m,l containing summands with fixed values of the parameters M and , t L. The number of such sums does not exceed (rmJ; therefore it is sufficient to t! prove that any sum ?m,z tends to zero. It is clear that if M < rm, then L > M +1. } Thus, since the number of summands in the sum ~Em,l does not exceed nM, and by " B.4.3) the probability of L fixed edges occurring in Gnj does not exceed cn~L nL M n Therefore, as n -> 00, ?2 -> 0. B.4.19) The assertion B.4.14) follows from B.4.18) and B.4.19). By B.4.14), the limit distribution for ar, r > 3, is the Poisson distribution with parameter Xr = Xra2r/Br). It is easy to see that in the current situation the number of loops a\ and the number of pairs of parallel edges a2 approach the Poisson distributions with parameters k\ = Xa2/2 and A.2 = X2a4/4, respectively. This proves Theorem 2.4.2. The more general Theorem 2.4.3 can be proved analogously. It is sufficient to verify that under the conditions of the theorem, for arbitrary fixed integers m \,..., ms, where Xra2r Xr = ^—. 2r By B.4.13), ,(k)\ (j(k) (k)\\ where j(k) _ i.(l,k) ^(U)! I =\ mk k =
118 Evolution of random graphs are unordered sets of a vertices, and // , / = 1 m^, k = 1,..., .v, are the numbers of cycles of length r^ under the labeling chosen. Therefore (,) -I,--- ,§(,, where We decompose the sum on the right-hand side of this representation into two parts; let the sum Ei include only summands with the distinct elements in all if- , I = 1,..., ntk, k = 1,..., s; and let the sum ?2 include all the remaining summands. For the summands of the first sum, the corresponding random variables equal 1 only if there exist m\r\ + - ¦ ¦+msrs fixed edges in Gnj. Therefore, by Lem- Lemma 2.4.1, pko) and, by B.4.9), B.4.10), and Lemma 2.4.2, 2r>\m' It remains to prove that E2 tends to zero. The summation in S2 is taken over sets of composite indices in which at least one of the elements 1, 2,..., n is encountered at least twice. A cycle corresponds to each of the composite indices. The existence of a common element in the cycles implies that the number M of distinct vertices contained in the cycles and the number L of distinct edges involved in the cycles satisfy L > M + 1. We decompose the sum E2 into a finite number of sums T,m,l containing summands with fixed values of the parameters M and L. By virtue of B.4.3), for each of these sums, the estimate holds because the number of summands does not exceed nM, and the probability of L fixed edges occurring in Gnj does not exceed cn~L. This proves Theorem 2.4.3. To prove Theorem 2.4.1, we need the following auxiliary assertion.
2.4 Nonequiprobable graphs 119 Lemma 2.4.3. Let ?,,..., ffl be nonnegative integer-valued random variables such that for an arbitrary fixed s and arbitrary nonnegative integers k\, ..., ks, as n -^ oo, where a\, aj, ¦ ¦ ¦ is a fixed sequence of nonnegative numbers. More- Moreover, suppose " -> 0 B.4.20) as s -+ oo, uniformly in n, and let oo = A < oo. k=i Then the distribution of the random variable ?„ = ?| + to the Poisson distribution with parameter A. ,{n) Sn converges Proof. We show that for an arbitrary fixed ? > 0 and an arbitrary fixed m, Ame'/ ml < ? for sufficiently large n. For fixed e and m, there exists s such that ? 3' ml ml where As = a\ + • • • + as. It is not hard to see that »> =m}\ < 0). Therefore, by B.4.20), \P{^n) = m} - P{$s{n) = m}\ < e/3 for sufficiently large s. Finally, the conditions of the lemma yield the convergence of the distribution of J + • • • + ?s (for any fixed 5) to the Poisson distribution with parameter As = a\ + ¦ ¦ ¦ + as. Therefore -As ml ? < - ~ 3 for sufficiently large s. ¦ Theorem 2.4.1 follows from Theorem 2.4.3 and Lemma 2.4.3, whose conditions are satisfied when ka2 < 1.
120 Evolution of random graphs 2.5. Notes and references The investigation of the evolution of random graphs began when P. Erdos and A. Renyi published the results of their study [37] in 1960. Along with the basic properties of the random graph G,t T, they discovered the effect known as a phase transition. At about the same time, V. E. Stepanov studied the graph Gn,p, as documented later [133, 134,135]. Until recently, Stepanov's results had not seemed to receive wide recognition. In particular, Stepanov proved that if p = c/n, where c is a constant, c > 1, then the size of the giant component is asymptotically normal with mean na(c) and variance nfi{c), where a(c) =1 , j8(c) = — -r-, c c(\ - y) and y < 1 is the root of the equation = ce~c. A similar assertion for the graph G^ T was proved by B. Pittel [123] about twenty years later. He found that the size of the giant component of G^ T is asymptotically normal with parameters na(c) and nfi{c){\ — 2y + 2y2/c) as n, T -> oo and IT In -> c> 1. Many open questions concerning the evolution of random graphs remain. The main goal of this chapter is to demonstrate the approach based on the generalized scheme of allocation in investigations of the evolution of random graphs. Sec- Section 2.1 shows that fine properties of subcritical graphs can be obtained in a rather simple and natural way, especially as concerns the behavior of subcritical graphs near the critical point. The transition phenomena for the graph Gn T were first considered by B. Bollobas [20]. The results presented in Section 2.1 can be found in [77]. The approach based on the generalized scheme of allocation allowed us to prove asymptotic normality of the number of unicyclic components and find the limit distribution of the maximum sizes of trees and unicyclic components. Section 2.2 is devoted to critical graphs. The behavior of random graphs near the critical point, and especially in the critical domain where the giant component appears, is very complicated and difficult to investigate. The investigations of the behavior are far from complete, but even now the results obtained could fill another book. Much information about random graphs can be found in the fundamental work by Bollobas [21] and in the book [105], which is devoted to the evolution of random graphs. A detailed investigation of the birth of the giant component is given in [63]. Supercritical graphs are considered by Luczak [99], who, in particular, proved that the right-hand bound of the critical domain is determined by the conditions n, T -> oo, A - 2T/nKn -> -oo. Formally, to analyze supercritical random graphs, we can use the representation of almost all such graphs as a combination of components of three types: one giant
2.5 Notes and references 121 component, trees, and unicyclic components. However, this approach is hampered by the absence of a simple formula for the number of connected graphs with n vertices and T edges with k — T - n > 0. Note that k = T - n is equal to the number of independent cycles in the graph and is called the cyclomatic number of the graph. Denote by c(n, k) the number of connected graphs with n labeled vertices and a cyclomatic number k. It is clear that c{n, — 1) is the number of trees, and by the Cayley formula, c{n, —l) = nn~2, whereas c(n, 0) is the number un of unicyclic graphs considered in Section 1.7. The numbers c(n, k) were investigated by Stepanov (see [10, 142, 143]) and E. M. Wright [151, 152] and are known as the Stepanov-Wright numbers (see [143]). As n —>¦ oo and k? /n —>¦ 0, c(n, k) = where, as it was proved by Meertens, d = l/B7r) (see Bender, Canfield, and McKay [16]). We hope that the results of the study by Bender et al. [17], who give the asymp- totics of c(n, k) for all regular variations of the parameters n and k, can be used in the application of the generalized scheme to random graphs and help to bring the investigations of supercritical graphs to the level attained for the subcritical case in Section 2.1. Note that obtaining the limit distributions of numerical character- characteristics of supercritical graphs would be merely a problem of averaging if the joint distribution of the size of the giant component and the number of its edges were known. The parameter 0 = 2T/n plays the role of time in the evolution of random graphs. Therefore, each numerical characteristic of a random graph can be con- considered not only as a random variable, but also as a random process with the time parameter 0. Of significant interest is the approach using the convergence of such processes. This approach is used in the recent papers [34, 62, 127]. Note that the investigations of convergence of such random processes in combinatorial problems were started by B. A. Sevastyanov [132] and Yu. V. Bolotnikov [22, 23, 24]. The random graph Gnj discussed in Section 2.3 was investigated by Kolchin [79, 83]. This graph provides an appropriate model of the graph corresponding to the left-hand side of a system of random congruences modulo 2 considered in the next chapter. An analogy of Theorem 2.3.8 for bipartite graphs was proved by Saltykov [131]. The nonequiprobable version of the graph Gnj is considered in Section 2.4, where the results of the papers [88,66,65] are presented. Here we use the method of moments. The lack of regular methods for an asymptotic analysis of nonequiprob- nonequiprobable graphs makes it impossible to carry out anything approaching a complete investigation of such graphs. It seems to us that developing the methods appropri- appropriate for the analysis of nonequiprobable combinatorial structures is a problem of great importance.
Systems of random linear equations in GFB) 3.1. Rank of a matrix and critical sets In this section, we consider systems of linear equations in GFB), the field with elements 0 and 1. Let us begin with two examples where such systems appear. Consider first a simple classification problem. Suppose we have a set of n objects of two sorts, for example, of two different weights. We may sequentially sample pairs of the objects from the set at random, compare the weights of the objects from the chosen pair, and determine whether the weights are identical or different. The problem is to identify the objects that have the same weight - actually, to estimate the probability of finding that solution. For a formal description of the situation, let {1, 2,...,«} be the set of objects under consideration and let xj be the unknown type of the object j, j = 1,...,«. We may assume that x\,..., xn take the values 0 and 1, depending on the class to which the object belongs. We choose a pair of objects i(t) and jit) in the trial with number t, t = 1,..., T, and let bt be the result of their comparison: bt = 0 if their weights are identical, and bt = 1 otherwise. Thus, the results of the comparisons can be written as the following system of linear equations in GFB): xi(t) +Xj(t) = bt, t = l,...,T. C.1.1) It is clear that the system can be rewritten in the matrix form AX= B, where X = (jq,..., xn) and B = (b\,..., bj) are column-vectors, and the el- elements atj of the matrix A = \\atj\\, t = 1,..., T, j = 1,...,«, are random variables whose distribution is determined by the sampling procedure. It is con- convenient to associate the system, or more precisely, the matrix A, with the random graph Gnj with n vertices that correspond to the variables x\, ..., xn. The graph has T edges (i(t), jit)), t = 1,..., T. Therefore the graph can have loops and multiple edges, depending on the sampling procedure. 122
3.1 Rank of a matrix and critical sets 123 In this chapter, we consider the characteristics of the graph Gn, t that are related to some of the properties of the system C.1.1). It is clear that the connectedness of the graph is an important characteristic for the classification problem. Indeed, in the case where the graph is connected, we can determine all values of the variables x\,..., xn if we set one of them equal to 0 or 1. In both cases, the partitions of the set are the same, but the system has two different solutions. In the case where the graph Gnj is disconnected, the system has more than two solutions; therefore a complete classification is impossible. Now let the vector B consist of independent random variables that take the values 0 and 1. If the balance is out of order, the weighings can sometimes be wrong, and the variables b\, ..., bj can differ from the true values. In this case, we obtain a system with distorted entries on the right-hand side that sometimes has no solution. If the balance is completely wrong, we may assume that the variables b\ ..., bj do not depend on the left-hand side of the system and take the values 0 and 1 with equal probabilities. In this situation, several natural problems arise. Does the right-hand side b\,..., bj depend on the left-hand side of the system or are the sides independent? Can we reconstruct the real values of x\,..., xn in the case where the right-hand parts b\, ..., bj are distorted? Let us turn to the second example. Let a vector (c\,..., cn) in GFB) be given. If we take an initial vector x\,..., xn, then we can develop the recurring sequence xn+t, t = 1, 2,..., by the following recurrence relation: xn+t — t — 1, 2,.... C.1.2) This recurrence relation can be realized with the help of a device called a shift register, presented in Figure 3.1.1. A shift register consists of n cells or stages with labels 1,2, ...,«. The ^-dimensional @, 1) vector of the contents of these stages is called the state of the shift register. At an initial moment, the state of the shift register under consideration is the vector (x\,..., xn). The choice of the vector (ci,..., cn) means that we choose the stages with numbers corresponding to the ones in the sequence c\, ..., cn and form the mod 2 sum xn+\ = c\x\ + - • - + cnxn. At the next moment, the contents of all stages are shifted to the left so that xn transfers to the stage numbered n — 1, xn-\ transfers to the stage n — 2, and so on, x\ leaves the register, and the sum jcn+i = c\x\ + • • • + cnxn is placed into the stage with label n. Thus the state (x\,..., xn) transfers to the state (jc2, ..., jcw+i). xt Xt+n-l -4— Figure 3.1.1. Shift register
124 Systems of random linear equations in GFB) The process is repeated. Thus, if c\, ..., cn are given, then for any initial state jci, ..., xn, the recurring sequence C.1.2) satisfies xn+\ = c\x\ H \-cnxn, Xn+2 = C\X2-\ \-CnXn + \, *n + T = C\XJ H \-CnXn+T-\. Let us change the notations and put bt = xn+t, t = 1, 2,..., T, and an = c\,..., a\n = cn. Then the first relation becomes a\\x\ H \-a\nxn = b\. It is clear that we can substitute c\x\-\ h cnxn for xn+\ in the second relation and obtain a2\x\ H \-a2nxn = b2. In the same way, we obtain H \-ainxn = b\, C.1.3) aT\x\ H \-aTnxn =bT. Suppose that the initial state (jci, ..., xn) is unknown and we observe the se- sequence b\,..., bj. Then we can regard relations C.1.3) as a system of linear equa- equations with respect to the unknowns x\,..., xn. A natural question is how many observations are needed to reconstruct the initial state and to obtain all elements of the sequence bt,t = T + I, The other situation concerns the feedback points c\,..., cn. Suppose we ob- observe the sequence b\,..., bj, but the vector (c\,..., cn) determining the shift register is unknown. If the number of l's in (c\,..., cn) is k, then there are (?) possibilities for this vector. If we use an exhaustive search to find the true vector that corresponds to the observed sequence, we have the following situation. If the chosen vector is true, then system C.1.3) is consistent for any T, but if the vector (ci,..., cn) is wrong, then the system becomes inconsistent for some T. There- Therefore the consistency of the system C.1.3) serves as a test for selecting the true vector. Let us introduce the auxiliary notions of a critical set and a hypercycle for our investigations of systems of linear equations in GFB). Note that the ordinary notions of linear algebra, such as the notion of linear independence of vectors, rank of a matrix, Cramer's rule for finding the solutions of linear systems of equations,
3.1 Rank of a matrix and critical sets 125 and so on. are extended in the obvious way to the «-dimensional vector space over GFB). For example, if the rank of a T x n matrix A = \\atj\\ in GFB) is r, then the homogeneous system of equations AX=0, where X = (x\,..., xn) is the column-vector of unknowns, has exactly n — r linearly independent solutions. Denote by at = (at\, ... ,atn), t = \,...,T, the rows of the matrix A. If the coordinate-wise sum then the set C = {t\,..., tm) of row indices is called a critical set. If C\ and C2 are critical sets and C\ ^ C2, then Ci a c2 = (Ci u c2) \ (Ci n c2) is also a critical set. Let s\,..., es take the values 0 and 1. Critical sets C\,..., Cs are called inde- independent if s\C\ A ?2C2 A • • • A ^Cs = 0' if and only if s\ = ¦¦ ¦ = es =0. Denote by s(A) the maximum number of independent critical sets and by r{A) the rank of the matrix A. Theorem 3.1.1. For any T x n matrix A in GFB), s(A)+r(A) = T. Proof. We consider the homogeneous system of equations A'Y = Q C.1.4) in GFB), where A' is the transpose of A. There is a one-to-one correspondence between the solutions of the system C.1.4) and the critical sets: The solution Ytlt...ttm = {y\,..., yr), whose components ytl, ..., ytm are 1 and the other com- components are zero, corresponds to the critical set C = {t\,..., tm). The linear independence of solutions corresponds to the independence of critical sets. There- Therefore the maximum number of critical sets s(A) equals the maximum number of linearly independent solutions of system C.1.4), which we know is T — r(A).
126 Systems of random linear equations in GFB) In addition to the critical sets of a T x n matrix A = \\atj\\, we consider a hypergraph Ga that is also defined by the matrix A. The set of vertices of the hypergraph Ga is the set {1,...,«} of column indices and the set of enumerated hyperedges is the set {e\,..., ej}, where et = {j- atj = 1}, t = 1,..., T. Thus there exists a correspondence between a row at = (at\, ..., atn) and the hyperedge et, t = 1, ..., T. Note that the empty set corresponds to a row consisting of zeros. The multiplicity of a vertex j in a set of hyperedges C = {etx, ..., etfn} is the number of hyperedges in C that contain this vertex. A set of hyperedges C = {etl,..., etm} is called a hypercycle if each vertex of the hypergraph Ga has an even multiplicity in C, in other words, if the coordinate- wise sum of rows atx + • • •+ atm in GFB) equals the zero vector. If each row of the matrix A contains exactly two 1 's, then the hypergraph Ga is an ordinary graph, perhaps with multiple edges, and a hypercycle is an ordinary cycle or a union of cycles. The set of the indices of hyperedges that form a hypercycle is a critical set for the matrix A. Let s\,..., ss take the values 0 and 1. Hypercycles C\,..., Cs are independent, if siCi As2C2A---AssCs = 0, if and only if s\ = ¦ • ¦ — ss = 0. Therefore the maximum number s(A) of critical sets of the matrix A equals the maximum number of independent hypercycles in 3.2. Matrices with independent elements This section deals with random matrices with independent elements. Let A = \\atj \\ be a T x n matrix whose elements are independent random variables taking the values 0 and 1 with equal probabilities, and let pn{T) be the rank of the matrix A in GFB). The following theorem is the main result of this section. Theorem 3.2.1. Let s > 0 and m be fixed integers, m + s > 0. Ifn^-oo and T = n + m, then 00 / 1 \ m+s where the last product equals 1 for m + s — 0. Proof. The limit theorem will be proved by using an explicit formula for P{pn (T) = n — s}. Denote by pn (t) the rank of the submatrix of A which consists
3.2 Matrices with independent elements 127 of the first t rows of the matrix A. We interpret the parameter t as time and consider the process of sequential growth of the number of rows. Let ?, = 1 if the rank pn(t — 1) increases after joining the rth row, and ?, = 0 if the rank preserves the previous value. It is clear that It is not difficult to describe the probabilistic properties of the random variables ?i,..., ?7-. The event {?, = 1} means that the rth row is linearly independent with respect to the set of the rows with numbers 1,..., t — 1, and the event {& = 0} means that the row with number t is a linear combination of the preceding rows. If among the preceding t — 1 rows there are exactly k linearly independent n- dimensional vectors, then the linear span of these k vectors contains 2k vectors (all linear combinations of these k vectors). The matrix A is constructed in such a way that each row can be obtained by sampling with replacement from a box containing all 2" distinct n-dimensional vectors. In other words, any row of the matrix A is independent of all other rows and is equal to any n -dimensional vector with probability 2~n. Therefore 2k = 0| A.(f-D=*l = ^, C.2.1) Thus the process pn(t) is a Markov chain with stationary transition probabilities that are given by C.2.1). To find P{pn(T) = n — s), we can sum the probabili- probabilities of all trajectories of the Markov chain that lead from the origin to the point with coordinates (n + m,n — s), that is, the trajectories such that pn@) = 0, pn(n + m) = n — s. If we represent a trajectory as a "broken line" with intervals of growth and horizontal intervals, we see that any such a broken line has exactly n+m — {n— s) = m+s horizontal intervals corresponding to m + s zeros among the values of ?1,..., %n+m. The graph of the trajectory with ?,, = 0,..., ^tm+s = 0 is illustrated in Figure 3.2.1. By using C.2.1) and Figure 3.2.1, we can easily write an explicit formula for the probability of a particular trajectory and for the total probability. The derivation of this probability is quite simple if m + s = 0. Indeed, the only trajectory with pn @) = 0 and pn{n+m) = n+m has no horizontal intervals, and at each interval the broken line increases; therefore P{pn(n+m) = n-s}= (l--L 2n - n MV /•=-/«+i
128 Systems of random linear equations in GFB) n — s t\ tj tm+s t = n +m Figure 3.2.1. Graph of the trajectory with ?r, = • ¦ • = t-tm+s = 0 and in the case m + s = 0, as n —> oo, i—s+l ^ ^ This coincides with the assertion of the theorem for m + s = 0 because the last product equals 1. In the general case, for m + s < 0, P{pn(n +m)=n-s] E ¦+tm+s—m—s X 2n(m+s) 2n X
3.2 Matrices with independent elements 129 Taking the factor 2<"-ff><m+5> out of the sum yields P{pn(n +m) =n — s) " / 1 \ o—.s(m+.s) n / i \ / — x \<t\ <--<tm+s<n+m As will be seen from the following evaluations, the moments t\,..., tm+s are concentrated at the end of the trajectory; therefore, in the sum of the formula, it is convenient to switch to the variables // = — (f/ —l + s — n), I = 1,..., m + s. It follows from 1 < t\ < ¦ ¦ ¦ < tm+s <n+m that 0<ri-l<f2-2<---< tm+s -m-s <n -s, and by subtracting n — s from each term, we obtain —n + s < t\ — 1— n + s ¦ ¦ ¦ < tm+s — m — s — n + s <Q. If we change the sign, we see that the domain 1 < t\ < • ¦ ¦ < tm+s < n + m in terms of the new variables is 0 < im+s < <i\ <n -s. Thus P{pn{n+m) = n-s} C.2.2) " / ft ( " / 1 0''—s It is easily seen that, as n —> oo, oo n ('-?)-n V 7 i=s+l and V^ 2~'m+s '' -^- Y^ 2~im+s ''. C.2.4) 0<im+s<-<i\<n-s To complete the proof it remains to transform the right-hand side of C.2.4). It
130 Systems of random linear equations in GFB) is not difficult to see that .Z—^ 9 7 I 92 / I or-l / z./ \ z. / \ z. / 1 \ -1 Passing to the limit in C.2.2) and taking into account C.2.3), C.2.4), and C.2.5) provide the assertion of the theorem. ¦ Let the elements of a T x n matrix A = \\atj\\ be independent and take the values 0 and 1 with equal probabilities. We consider the system of equations AX=0 C.2.6) with respect to unknowns X = (x\,..., xn) in GFB). Denote by vnj the number of linearly independent solutions of this system of equations. If the rank pn{T) of the matrix A equals r, then vnj=n—r. Therefore Theorem 3.2.1 yields the following assertion. Theorem 3.2.2. Let s > 0 and m be fixed integers, m + s > 0. Ifn —> oo, then 00 / 1 \ m+s / 1 \-1 (i - y) n 0 - ?) • where the last product equals 1 /or m + s = 0. In particular, for m = s = 0, / 1 \ P{vn<n = 0} -> Yl ( 1 - — ) = 0-28878816.... i
3.2 Matrices with independent elements 131 The results of Theorems 3.2.1 and 3.2.2 are of special interest because they are stable in the sense that the limit distribution of the rank of a matrix is invariant with respect to deviations of the distributions of its elements from the equiprobable distribution. Theorem 3.2.3. Let the elements of a T x n matrix A = \\atj\\ be independent and suppose there is a positive constant 8 such that, for the probabilities pf- = P{atj = 1}, the inequalities hold. Let s > 0 and m be fixed integers, m + s > 0. Then, as n —> oo, 00 / 1 \ m+s / 1 where the last product equals 1 for m + s = 0. Because these results are outside of the main combinatorial direction of this book, we will omit the complicated proof of this theorem (see, e.g., [93]). We illustrate the situation by proving that, under the conditions of Theorem 3.2.3, the mean value of the number of nontrivial solutions of system C.2.6) is invariant to deviations of the distributions of elements of A from the equiprobable distribution. Let iint t be the number of nontrivial (i.e., nonzero) solutions of system C.2.6). If we associate to the vector X an indicator that is 1 if X satisfies the system, then We will evaluate Eixnj by using the following lemma on summation of inde- independent random variables in GFB). Lemma 3.2.1. Let ?j\,... ,?jn be independent random variables that take the val- values 0 and 1 with probabilities Then, in GFB), + +6, n . Proof. It is clear that it suffices to prove the assertion of the lemma for n = 2. In
132 Systems of random linear equations in GFB) that case, 4 A|A2 {? =0, h= A+A,)A-A 4 If the elements of A are independent and take the values 0 and 1 with equal probabilities, then by Lemma 3.2.1, for any X ^ 0, P{AX = 0} = (P{anjci + • • • + alnxn = 0})T = 2~T. Therefore E[xnj = BW — lJ~r, and for T = n + m, where m is a fixed integer, 1 1 2m and as n —> oo, Under some conditions on the nonequiprobable distribution of the matrix A, the last result still holds. Let p™ = P{atJ = 1}, and, as before, denote by \inj the number of nontrivial solutions of system C.2.6). Theorem 3.2.4. Under the conditions of Theorem 3.2.3, Hn,T -+ 2 Proof. By using the indicators as in the calculation of the mean number of solu- solutions in the equiprobable case, we find that k=\ \<j\<-<jk<n where, for any fixed set {j\, ..., jk} from the domain of summation, the term Pj\ jk = P{AX = 0} corresponds to the vector X = (x\,..., xn) whose ele- elements with indices j\,..., jk are 1 and the remaining elements are zero. We represent the probabilities p^ as
3.2 Matrices with independent elements 133 According to the conditions of the theorem, there exists A < 1 such that |A^| < A for all t and j. Since the rows of A are independent, T where By Lemma 3.2.1, 1 + At P{atJl 0} and for all t and 1 < j\ < ¦ ¦ • < jk < n, 1 - A^ (t) 1 + A^ 2 ~~ Ji>—>Jic ~ 2 Hence, for PJ{ jk, we obtain the bounds By using these inequalities, we find from C.2.7) that dWorn Now let T = n + m, where m is a fixed integer. The left and the right sides of C.2.8) can be estimated in the same way. Therefore we obtain only an estimate of the right-hand side. Let "¦-torn"" and compare S(A) to k=\ We have seen that S@) -> 2 m as n -> 00. We show that for any fixed A, 0 < A < 1, the difference S(A) - S@) tends to zero. We divide S(A) into It
134 Systems of random linear equations in GFB) two parts: /n\/\+Ak\"+m (() \<k<sn where s, 0 < s < 1/2, will be chosen later. For the sake of simplicity, suppose that s is such that sn is an integer; then for any s and A, 0 < A < 1, 0 < e < 1/2, <) U U ennen by using the inequality n! > n"«/ne~". This bound for Si (A) can be written as l + AX /1 + A \tt S\(A) < 2eEe -? If we choose a sufficiently small s, we can make the value A + A)/{2e?e~s) less than 1. For such s, the bound tends to zero as n -> oo. Thus, there exists a fixed s, 0 < s < 1/2, such that the value Si (A) and, consequently, ^(O) tend to zero, and Si (A) - Si @) -* 0. We now estimate the difference S2(A) — ^(O). It is clear that 0 < S2(A) - S2@) = ? J^O + A1)-!) s<k<n = (A + Aewr+/" - 1) '"Y en<k<n N 7 Since A + As")"+m -> 1 as n ^ oo, it follows from the estimate obtained above that 5(A) - S2@) -> 0. Thus we have shown that S(A) - 5@) -> 0 and 5@) -* 2-m; hence, S(A) -> 2-w. Theorem 3.2.4 is thus proved. ¦
3.3 Rank of sparse matrices 135 We can actually relax the hypotheses of Theorem 3.2.4. The result remains true if for t = 1,..., T, j = 1,..., n, log n + xn ^ @ < i _ lQg n +xn n — 'J — n ' where xn tends to infinity arbitrarily slowly (see [93]). These bounds are exact in a sense because, as we will show in the next section, the limit distribution of the rank of a matrix A differs from the distribution given in Theorem 3.2.1 if the probability of l's does not satisfy these inequalities. 3.3. Rank of sparse matrices In Section 3.1, we introduced the notion of critical sets of a matrix. Recall that a set {?i,..., tm} of row indices of a matrix in GFB) is called critical if the coordinate- wise sum of rows with indices t\,..., tm is the zero vector. The notion of indepen- independence of critical sets was also introduced, and s(A) denoted the maximum number of independent critical sets of a matrix A. According to Theorem 3.1.1, the rank r(A) of a matrix A is related to s(A) by the equality s(A) + r(A) = T. Therefore, instead of the rank of a matrix, we can investigate the maximum number s(A) of independent critical sets of the matrix. In this section, critical sets are applied in the analysis of the rank of random sparse matrices. Let the elements of a T x n matrix A = ||a^|| be independent random variables such that P[aiJ = „ = !5^±f, PlaiJ = 0] = l-**!+!. C.3.1) n n where x is a constant, t = 1,... ,T, j = 1,... ,n. We find the limit distribution of s(A) for such a matrix. Theorem 3.3.1. Ifn, T —> oo such that T/n —> a, 0 < a < 1, and condition C.3.1) is valid, then the distribution of the maximum number of independent critical sets s(A) converges to the Poisson distribution with parameter X = ae~x. We show first that the distribution of the number of critical sets that correspond to zero rows of the matrix converges to a Poisson distribution. Denote the number of zero rows of the matrix A by %nj. Lemma 3.3.1. Ifn,T —> oo such that T/n —> a, 0 < a < oo, and condition C.3.1) is valid, then for any fixed k = 0, 1,..., where X = ae~x.
136 Systems of random linear equations in GFB) Proof. The probability pn that a fixed row consists entirely of zeros is and under the conditions of the lemma, pn = -e-x(l+ n The random variable %nj has the binomial distribution with parameters (T, pn), where T is the number of trials and pn is the probability of success. Under the conditions of the lemma, the mean number of successes Tpn tends to ae~x\ hence, the binomial distribution converges to the Poisson distribution with parame- parameter ae~x. ¦ We now prove that if a < 1, then with probability tending to 1, all critical sets consist of only zero rows. Lemma 3.3.2. Ifn, T -> oo such that T/n -> a, a < 1, and condition C.3.1) is valid, then with probability tending to 1, the critical sets of A consist of only zero rows. Proof. We consider the total number of critical sets in which each contains at least one nonzero row. It is sufficient to prove that the mathematical expectation of this number tends to zero. Although the proof of this fact is straightforward, it involves many cumbersome estimations of sums containing the binomial coefficients. An even number of successes among k independent trials with probability of success p occurs with probability A + {q — p)h)/2. Let us find the probability that k fixed rows form a critical set containing a nonzero row. The indices of these rows form a critical set if each column of the submatrix formed by these rows contains an even number of 1 's. According to the remark on the probability that the number of successes is even, this probability equals Therefore the probability that these k rows constitute a critical set equals Note that the probability that there is no 1 in all these k rows is equal to / _ \ogn+x\k"
3.3 Rank of sparse matrices 137 By using the corresponding indicators to represent the total number of nontrivial critical sets and the number of the critical sets that consist of zero rows, we obtain the following expression for the mean number of critical sets that do not consist of zero rows: where / K\ogn+x)\k rt=1 + A n J ' We include the terms with k = 0 into these sums because they cancel each other. Note first that and under the conditions of the lemma, Now consider the sum ^*V Set a = 1 — 2(logn + x)/n for now. The following equalities hold: i=o N'/ ~ i=q ^'/ ~ i=o N'/ ~ k=o k=0 w 1=0 x 7 k=0 Let and divide the sum
138 Systems of random linear equations in GFB) )T -, k\ &2 ^3 n/2 k4 Figure 3.3.1. Graphs of the functions ("kJ~" and r? n into five parts so that S{n, T) = Si + S2 + S3 + S4 + S5, where 0<k<kr S2 = ? ' k\ <k<k2 S3 = s4= y. - E S5= 2_^ ak, kx=en, k3 = \ - ln k4 = \ + l and the value of s will be chosen later. For convenience we present the graphs of the functions (nkJ~n and r\ = A + A - 2(logn + x)/n)k)T as functions of A: in Figure 3.3.1. The major contribution to Sin, T) is made by the sum S4. It is clear that n uniformly in the integers k = n/2 + u^/n/2 such that \u\ < n1/10. These k form the domain of summation of S4, which equals {&: \u\ < n1/10}. Therefore , / 2(logn+x)\ —e n ~x n = e ae
3.3 Rank of sparse matrices 139 uniformly in k in the domain of summation of S4. Thus x E since by the de Moivre-Laplace theorem, -1- 2" We now have to show that the remaining four sums tend to zero. We begin with Since rj is monotone, we find that E Under the conditions of the lemma 65 -> 0, since, as was proved, rj. -^ eae *, and according to the de Moivre-Laplace theorem, E Let us estimate By using the monotonicity of r[, we find that, for sufficiently small s such that en is an integer, (\+sn)n?2T ( 2T'n \n < d+)f -e 2eeeJ It is clear that 2r//JBeee-?)-1 <^ < 1 for sufficiently small e; therefore 5i -> 0 as n -> 00. It remains to consider S2 and 63. Let us begin with S2= ^ ak. sn<k<n(\-e)/2
140 Systems of random linear equations in GFB) We first show that ak is a monotone increasing function for k such that sn < k < n{\ - s)/2. Indeed, k+\)rk+\ n -k (\ + A -2(logn+x/n) k+\ \ T k+l\ 1+A n-n{\ -e)/2 n(\ -s)/2- 1 x (i _ A ~ 2Qog n + x)/n)k - A - 2(logn + x)/n)k+l \ T X V l + (l-2(logw 1+g 1 -e + 2/n x / A - 2(logn +x)/n)^ - A - 2(logn X V l + (l-2(logh Since 1 + A — 2(logn + x)/n)k > 1, we obtain 1 - e + 2/n l-e + 2/n For sufficiently large n, (l+e)/(l -s + 2/n) >l+e. Moreover, for k satisfying sn <k<n{\ - e/2), 2(logn+x)\k ^ e_2k{logn+x)/n where c is the constant e~2sx. Thus, for sufficiently large n,
3.3 Rank of sparse matrices 141 If we estimate S2, we can use the monotonicity of ak to obtain the inequality S2 < Let us estimate (n\[ I ( n Since a rough estimate is acceptable, we content ourselves with the bound e~u <2 du{\ + o Here we used the well-known asymptotics (~Z e-u2'2du = -e-z2l2{\+ J-00 z as z -> 00. Thus, there exists a constant a such that « Let us estimate the second factor of ak2. It is clear that 1 + (l - 2^n+XAh\ < (! + e-2k2(\ogn+X)/ny where Z) is a positive constant. By combining the estimates of the two factors of a^2, we obtain the bound and 62 —> 0 if we choose s < 1/5. It remains to estimate
142 Systems of random linear equations in GFB) It is clear that n and S3 -* Oife < 1/5. ¦ Proof of Theorem 3.3.1. The assertion of Theorem 3.3.1 follows from Lem- Lemmas 3.3.1 and 3.3.2 because by Lemma 3.3.2, under the conditions of the theorem. ¦ The following theorem is a corollary to Theorem 3.3.1. Suppose that l0gr + X < p™ < 1 - l0gr+X, C.3.2) where x is a constant and t = 1, ..., T, j = 1, ..., n. Theorem 3.3.2. Ifn,T —>¦ oo such that T/n —>¦ a, 1 < a < oo, and condi- condition C.3.2) is valid, then the distribution ofs(A) converges to the Poisson distri- distribution with parameter X = e~x /a. Proof. Since the rank of a matrix is the maximum number of linearly independent rows or columns, we apply Theorem 3.3.1 to the transpose matrix and obtain the assertion of Theorem 3.3.2. ¦ Because we know the limit distribution of the rank of a matrix A, we can obtain some results for the behavior of the solutions to the system of linear equations with the matrix A. Let us consider the system AX=B, C.3.3) where the elements of the T x n matrix A = \\atj\\ are independent, and for t = 1,..., T, j = 1,... ,n, P{aij = „ = where x is a constant, the column-vector B = (bi,..., br) is independent of A, and the random variables b\,..., bj are independent, taking the values 0 and 1 with equal probabilities. Denote by (in 'T the number of solutions of the system C.3.3). The examples cited in Section 3.1 show that the consistency of linear systems plays a particular
3.4 Cycles and consistency of systems of random equations 143 role in some of the problems related to such systems. The probability of consistency Pnj of system C.3.3) is the probability that the system has at least one solution: By using Theorem 3.3.1 we can easily prove the following assertion. Theorem 3.3.3. Ifn,T —>¦ oo such that T/n —>¦ a, 0 < a < 1, and condition C.3.1) is valid, then Pn.T - e-""l\ Proof. If the rank r(A) of A equals r, then Indeed, let the linearly independent rows have the indices 1,2,... ,r. Then each of the rows with indices r + 1,..., T is a linear combination of the first r rows, and for the system to be consistent, each of the right-hand parts br+ \, ... ,bj must satisfy a linear relation of the form sltbi+--- + srtbr=bt, t = r + l,...,T, C.3.5) where e\t,..., srt are constants taking the values 0 and 1. The probability of the validity of any of the relations C.3.5) is equal to 1/2 and, hence, assertion C.3.4) is true. Since {r(A) = r) = {s(A) = T — r), by the total probability formula, T T Pn,T = ^ ^ r=0 s=0 The last series from C.3.6) is majorized by the series X^o 2~s and converges uniformly. Therefore it is possible to pass to the limit under the sum in C.3.6). Passing to the limit with the help of Theorem 3.3.1 yields oo 2s s\ s=0 where A. = ae x. 3.4. Cycles and consistency of systems of random equations In this section, we consider a system of T equations in GFB): Xi(t)+xjit)=pt, t=\,...,T, C.4.1)
144 Systems of random linear equations in GFB) where i(t), j(t), t = 1,..., 7\ are independent random variables that take the values 1,..., n with equal probabilities, and the variables fi\,..., fir take the values 0 and 1. We denote by Anj the matrix of this system. As in Section 3.1, we associate the matrix Anj to a graph Gnj with n labeled vertices that cor- correspond to the variables jci, ... ,xn. The graph has T edges G@, j(t)), t = 1,..., T. Thus the edges of the graph Gnj may be considered an outcome of T independent trials: In each trial, an edge joins two different vertices / and j with probability 2n~2 and forms the loop at a vertex / with probability n~2, i, j = 1,..., T. Thus the graph Gnj is the same as the graph considered in Sec- Section 2.3. Denote by iinj the number of solutions of the system C.4.1) and consider the probability of consistency PnJ = P{^nJ > 0}. We want to express Pnj in terms of the characteristics of Gnj. Denote by >cnj the number of components of the graph Gnj. Theorem 3.4.1. If f5\ ..., f}j are independent random variables that take the values 0 and 1 with equal probabilities and do not depend on Anj, then k=\ Proof. We first assume that Gnj is a connected graph. We can then choose a tree that is a skeleton of the graph. This tree contains n — 1 edges that correspond to a subsystem containing n — 1 equations of the system. If we assign a fixed value to one of the unknowns, then with the help of the corresponding subsystem, we obtain the values of all other unknowns. Consequently, the right-hand sides of the remaining T — n + 1 equations must each take a fixed value for the system to be consistent. Since fi\,..., fix are independent and take the values 0 and 1 with probabilities 1/2, the probability of consistency is (l/2)r~w+1 for Gnj connected. Now assume the graph Gnj consists of k components with n \, ..., nk vertices and T\, ...,Tk edges, respectively. The whole system is consistent if and only if each of its subsystem is consistent. Under the condition that the number of components xnj = k and, consequently, that the system decomposes into k disjoint subsystems, the probability of consistency is 1 1 1 1 27*1-/71 + 1 272-/12+1 ''' 27*-/J*+1 ~ 2T~"+k' When we apply the formula of total probability, we obtain the assertion of the theorem. ¦
3.4 Cycles and consistency of systems of random equations 145 According to Theorem 3.4.1, the number of components of the graph Gnj can be used to investigate the system C.4.1). Likewise, we can consider the maximum number of independent critical sets s(Anj) introduced in Section 3.1. According to Theorem 3.1.1, the maximum number of independent critical sets s(Anj) and the rank r(Anj) of the matrix Anj are related by the equality s(AnJ) + r(AnS)= T. It is not difficult to prove that xn,T =n - T + s(Anj), and the rank r(y4Wi7<) = n—xnj. Thus, the assertion of Theorem 3.4.1 isequivalent to relation C.3.6). We remarked in Section 3.1 that a critical set of Anj corresponds to a cycle or a union of cycles in the graph Gnj, and the maximum number of critical sets s(Anj) equals the maximum number of independent cycles. The graph Gnj was studied in Section 2.3. We have seen that if n, T —>¦ oo such that 2T/n —>¦ X, 0 < A. < 1, then with probability tending to 1, the graph has no components with more than one cycle. Therefore, under these conditions, all cycles of Gnj are isolated and, consequently, independent. As in Section 3.1, we denote by v(Gnj) the number of cycles in Gnj. It was proven (see Theorems 2.3.3 and 2.3.4) that if 2T/n —>¦ A, 0 < A < 1, then } r ^ 1, C.4.2) l and for any fixed k = 0, 1..., Ake~A P{v{GnJ) = k) -+ —, C.4.3) where These results allow us to analyze the probability Pnj of consistency of the system C.4.1). Theorem 3.4.2. Ifn, T -+ oo such that 2T/n -+ A, 0 < A < 1, and the right- hand sides fi\ ..., fix of the system C.4.1) are independent random variables that take the values 0 and 1 with probabilities 1/2 and do not depend on Anj, then
146 Systems of random linear equations in GFB) Proof. When we use Theorem 3.4.1 or the equivalent formula C.3.6), we find that k=\ r=0 s=0 Taking into account C.4.2) and C.4.3) and passing to the limit under the sum yield s=0 In the same way, we can treat the nonequiprobable case, where the indices (i(t), j(t)), t = 1,..., T, of the variables of system C.4.1) are independent identically distributed random variables that take the value i with probability pi, i = 1,..., n, p\ + • ¦ ¦ + pn = 1. As before, let the right-hand sides fa,..., fir be independent, take the values 0 and 1 with equal probabilities, and not depend on An j. We retain the notation Pnj for the probability of consistency of such a system. Theorem 3.4.3. Let pi = ai/n, where a, = a;(n), 0 < ?q < a, < e\ < oo, i = I,... ,n, so and s\ are constants, and let a = hm — > af. n-^-oo n *—' 7=1 Ifn, T -+ oo such that2T/n -+ X anda2X < 1, then Proof. In Section 2.4, the nonequiprobable graph Gnj corresponding to the ma- matrix Anj was considered. The graph contains n labeled vertices and T edges that can be obtained by the following T independent trials. In each trial, one edge is drawn. The edge connects two different vertices i and j with probability 2pt pj, and a loop at a vertex i is formed with probability pf, i, 7 = 1,..., n, p\ + ¦ • - + pn = 1. According to Theorem 2.4.1, under the conditions of Theorem 3.4.3 for any fixed k = 0, 1,..., P{v(Gn,T)=k} Ake~A k\
3.4 Cycles and consistency of systems of random equations 147 where v{Gnj) is the number of cycles in Gnj, and A =-I log(l -a2k). If we reason as we did in the proof of Theorem 3.4.2, we obtain the assertion of Theorem 3.4.3. ¦ The proofs of Theorems 3.4.2 and 3.4.3 are mainly based on assertion C.3.4) that P[fin,T > 0 | r(AnJ) = r}= 2~T+r. C.4.4) The proof of this assertion in Section 3.3 used the fact that if r rows are lin- linearly independent and r(Anj) = r, then each of the remaining rows is a lin- linear combination of these r rows, and the system is consistent only if the corre- corresponding right-hand sides satisfy a certain linear relation. If the right-hand sides fix,..., fir are independent, then such a relation is satisfied with probability 1/2, and the events corresponding to different relations are independent. In other words, each cycle in Gnj imposes a restriction on the right-hand sides fii,..., $t, these restrictions are independent, and each of them is satisfied with probabil- probability 1/2. If the right-hand sides fii,..., fir take the values 0 and 1 with unequal prob- probabilities, then property C.4.4) is not valid, and the corresponding formula for the probability Pnj of the consistency of the system becomes more complicated. In this section, we prove the following assertions. Let Pn,T(k) = P{fxnj > 0, v(GnJ) = k], Pnj = PfAVr > 0}. Theorem 3.4.4. Let the right-hand sides fti, ..., fir of the system C.4.1) be independent identically distributed random variables that take the values 0 and 1 with probabilities 1 — p and p, respectively, 0 < p < I, A = l— 2 p. Ifn,T -+ oo such that 2T/n -> k, 0 < k < 1, then for any fixed k = 0,1,..., Theorem 3.4.5. Let the right-hand sides /3\,..., fir of the system C.4.1) take the values 0 and 1, and letm = m (T) be the number of\'sin/3\,...,fiT-
148 Systems of random linear equations in GFB) Ifnt T -+ oo such that 2T/n -* X, 0 < X < 1, am/ m/T ^ p, 0 < p < then for any fixed k = 0, 1, ..., rn,T ^ I , V1 " A = 1 - 2p. « Before proceeding to the proof of these theorems, we will establish some aux- auxiliary results. Let /Ji,..., /Jj be independent identically distributed random ^vari- ^variables that take the values 0 and 1 with probabilities 1 — p and p, respectively; let A = 1 — 2p; and let E be the set of the even numbers. Let ro =0 and r\,..., rk be positive integers. We consider the random variables Vi = Pro+-+n_\ + l H h Pro+-+rn i = 1, . . . , k. Lemma 3.4.1. P{r)i g E, i = 1,..., k} = -j(l + Ari) •••(! + Ark). Proof. It suffices to note that the random variables r)\,...,r)k are independent and that the probability of the event of the sum /3i + • • • + fir being even equals A + Ar)/2. ¦ When the variables f}\,..., f5j are nonrandom, we need a similar assertion for the following scheme of allocating m particles into T cells. The cells are divided into^+1 groups of cells containing r\,... ,rk, T — r\ rk cells, respectively. We assume that each cell can contain at most one particle, that m < T, and that each of (w) possible allocations are equiprobable. We introduce the random variables ?i,..., %t, setting ?, = 0 if the cell number / is empty, and ?,• = 1 otherwise, for / = 1,..., T. By analogy with the random variables rj\,..., rjk, we define the random variables & = ?ro+-+r/_i + l H \- %ro+-+n, i = 1, ... ,k. It is not difficult to verify the following assertions. Lemma 3.4.2. Ifr\,...,rk are fixed, T ^ oo, andm/T -> 0, then ¦r. p f j — 1 M —* 1 Lemma 3.4.3. Ifr\,...,rk are fixed, T -^ oo, and m/T -> 1, then PR, g?, / = !,..., k}^ 1
3.4 Cycles and consistency of systems of random equations 149 if all r\, ..., t'k are even; and P{C/ G E, i = \,...,k}^0 if at least one ofr\, ..., r^ is odd. Lemma 3.4.4. lfr\,..., r^ are fixed, T —>¦ oo, andm/ T —>¦ p, 0 < p < 1, where A = 1 — 2p. We now consider the graph Gnj and mark the cycles in the graph by the following rule. Recall that An,T is the set of all graphs with n labeled vertices and T edges whose components are trees and unicyclic components, allowing cycles of length 1 and 2. If a realization of the graph Gnj belongs to the set An,T, then every cycle of length r is marked with probability pr independently of the others. If the graph contains a component with more than one cycle, then no cycle of the graph is marked. We denote by pnj (k) the probability of the event that the number of cycles v{Gnj) in the graph Gnj is equal to k and all cycles are marked. It is clear that the probability pnj of the event that all cycles are marked equals oo Pnj = ^Pnjik). As in Section 1.7, we denote by dm the number of mappings of the set {1,..., m} into itself whose graphs are connected, and by a$ the number of mappings of the set {1,..., m} into itself whose graphs are connected and contain a cycle of length r. Let Fn^ denote the number of forests with n labeled vertices and N trees, T = n—N. Explicit expressions for dm and d^ are well known. By using the formula for the number of rooted trees, we obtain (m-r)\ hence, m r=l k=0 Lemma 3.4.5. For any integer k, 1 < k < min (n, T), Pnj(k) = -r-i-j) > FW_W,7V > TJ j— X 7 OT=1 X 7 7Wi + -+7W*=7W X
150 Systems of random linear equations in GFB) where and for k = 0, Proof. For k = 0, the assertion is obvious. As in Section 1.7, let us denote by Vi the number of connected graphs with n labeled vertices and one cycle of length r. It is clear that bil)=4l\ b™=d?\ bP=dP/2, r>3. C.4.5) Denote by Cnj the event that the graph Gnj contains no unmarked cycles. We represent the event [v(Gnj) = k, Gnj G Anj, Cnj) as a union of the following disjoint events: In a specific order, T trials give T fixed edges that form a graph consisting of trees and k unicyclic components, including a marked cycle. It follows from this description that Pnj(k) = P{v{GnJ)=k, GnJ eAnj, Cnj) = ?0 ? ml , jn) *-* m\\---mk\k\ m=k x ' m\-\ Ymkm mk E WPn • • • Eb^ n2 2*i+*2' where si = si(ri,..., rk) is the number of l's among r\,..., n, and S2 = S2(r\,..., t>) is the number of 2's among n,..., rk. The factor 2~Sx appears because the probability 2n~2 is replaced by n~2 in s\ cases. The factor 2~S2 re- reflects the fact that permuting trials in which two identical edges occur results in the same graph. The lemma follows from the relations C.4.5). ¦ Theorem 3.4.6. Ifn, T -+ oo such that2T/n -+ X, 0 < A. < 1, then for any fixed k = 0, 1, ..., where oo m
3.4 Cycles and consistency of systems of random equations 151 Proof. The proof is similar to the proof of Theorem 1.8.2. We partition the sum from Lemma 3.4.5 into two parts. We put It is clear that for any x in the domain of convergence of the series D{x) = oo ''m D xm we have y y m\\ ¦ ¦ ¦mici *—' , ml m>M m\-\ \-mk=m m>M/k C.4.6) Along with the function D(x), let us introduce the generating function of the number of connected mappings , v—v UmX d(x) = m\ m—\ The inequality D(x) < d(x) C.4.7) holds because Also, m — <(m-l)\em, which implies < E m>M/k m>M/k Let 00 x v—\ n x By Example 1.3.2 and A.4.8), d(x) = loga(x), a{x) = A - We put a = IT In and x = ae~a for a < 1. Then 6(x)=a,
152 Systems of random linear equations in GFB) Under the hypothesis of the theorem, a x = ae~a, there exists q < 1 such that ex n. Therefore 2T/n -> k, 0 < k < 1, and for ael~a < q < 1 for sufficiently large */* Using estimates A.8.8), A.8.9), and C.4.6)-C.4.9) yields T\ B m>M m\-\ \-mk=m m>M m\-\ m\\ C.4.9) n\ m\Dmr-Dmk )tn-m,N ; ¦¦¦</ „,, m>M/k n where c\, C2 are constants. Thus, under the hypothesis of the theorem, S2 If n,T -> oo,2T/n -> k, 0 < k < 1, then by virtue of A.8.7), 0. uniformly in m < M = T1/4. Therefore, for any fixed k = 1,2,..., m<M m\-\ \-mk=m m=k m\+- By using the estimate of 52, we obtain f\^k ~?k\
3.4 Cycles and consistency of systems of random equations 153 Combining the estimates of S\ and 52, we obtain, under the hypothesis of the theorem, = P{v(GnJ) = k, Gn,T eAn,T, CnJ) 2kk\ C.4.10) Hence the assertion of Theorem 3.4.6 for k > 1 follows, since x = ae a = {2T/n)e~2Tln -* Xe~x = a and D(x) -* D(a). We use A.8.6) and the repre- representation from Lemma 3.4.5 and conclude that Corollary 3.4.1. Ifn,T —> oo such that 2T/n —> X, 0 < X < 1, then the probability pnj of the event that the graph Gnj contains no unmarked cycles satisfies the relation Proof. We denote by pn T the probability of only marked cycles in the case where the graph has k unicyclic components and all the probabilities pr are equal to 1, r = 1, 2, In this case, D(a) = d(a) = 2A = - log A — X), and Theorem 3.4.6 gives Ake~A k\ = 0,1,.... To prove the corollary, it suffices to show that in the sum Pn,T = C.4.11) k=0 one can pass to the limit under the sum. Let us show that for any s > 0, there exists K such that We choose K such that oo E k=K+l OO Pnj{k) < S. S r C.4.12) k\ k=K+\ and for fixed K, we choose no so that for n > no, K Ake~A k\ k=0 s —. 2
154 Systems of random linear equations in GFB) Then, for n > no, oo V^ (') i Z_^ L n, i k=K+\ and therefore oo «- E k=K+\ Ake~K k\ oo E / k=K+\ K k=0 Ake- k=0 k\ s r Since pnj(k) < pn T(k), estimate C.4.12) and the validity of passing to the limit under the sum are established. ¦ Proof of Theorems 3.4.4 and 3.4.5. A cycle leads to the inconsistency of system C.4.1) if the sum of the right-hand sides of the subsystem corresponding to the cycle is odd. Let pr be the probability that this sum is even for a cycle of length r. Then Pnj{k) = pnj{k) for any k = 0, 1,.... Therefore Theorems 3.4.4 and 3.4.5 are direct corollaries to Theorem 3.4.6 and the fact proved above that one can pass to the limit under the sum in C.4.11). To prove Theorem 3.4.4, we notice that in this case, according to Lemma 3.4.1, pr = Ar)/2, where A = 1 - 2p; therefore oo m D(x) = m = \ m\ m = \ r=\ 2m! C.4.13) m = \ m = \ r=\ where 00 m d(x, A) = > . > . *m x m m=\ r=\ m\ For* = ae a, 0 < a < 1, and = -\og{\-a), ,A) = -log(l-aA).
3.4 Cycles and consistency of systems of random equations 155 Indeed, oo m ,(r) .r m oo oo ,(r) .r m d(x, a)=v-v-^ax ^^am ax m! /—j I I m! l , m=\ r=\ r=\ m=\ {jn - r)\ *-" Z-" t\ r=\m=r v r=l /=0 By using the well-known equality 00 t=0 from [124], Chapter 2, Problem 210 (see also [126]), we obtain 00 Arxrear ^ Arar d(x, A) = r=l ^ r=\ We conclude by noting that for a=2T/n-^-X,0<X< 1, </(*) -> - log A - X), d(x, A) = - log A - a A) -+ log A - A A). Let us turn to the proof of Theorem 3.4.5. If m/T -> 0, then for any fixed k, all the cycles are marked with probability tending to 1. Therefore In the case where m/T —> 1, we have pr —> 0 for odd r and pr —> 1 for even r by Lemma 3.4.3. Therefore, in this case, n _y nB) - oo D{x) = .... m\ m=\ m=\ It is not difficult to see that In the case where m/T -> /7, 0 < p < l,by Lemma 3.4.4, Pr -> A + Ar)/2,
156 Systems of random linear equations in GFB) and, as in C.4.13), D(x) -+ D(a) = (d(a) +d(a,A))/2 = -i log A - A,)(l - AA.). 3.5. Hypercycles and consistency of systems of random equations In Section 3.2, we studied the rank of random matrices and found, in particular, that if the elements of a T x n matrix A = \\atj || are independent identically distributed random variables taking the values 0 and 1 with equal probabilities, then the rank r(A) of the matrix A has a threshold property: If T/n -> a and a < 1, then P{r(A) = T} -* 1, and if T/n -* a and a > 1, then P{r(A) = n) -* 1. In other words, the maximum number of independent critical sets s(A) tends in probability to zero in the former case and to infinity in the latter case. A similar property apparently holds for the sparse matrices considered in Section 3.3: We proved only that if a < 1, then s(A) has in the limit a Poisson distribution, and Es(A) —> oo for a > 1. In Section 3.4, we considered systems with at most two unknowns in each equation. It was shown that if T/n -> a, 0 < a < 1/2, then the maximal number of independent critical sets or independent cycles in the corresponding graph approaches the Poisson distribution with parameter A = — 5 log(l — 2a). As follows from Theorem 2.1.6, if a > 1/2, then s(A) tends in probability to infinity. The case of a matrix with independent and identically distributed random ele- elements taking the values 0 and 1 with probabilities 1/2 and the case of a matrix with at most two elements in each row studied in Section 3.4 can be considered as the extreme cases in terms of the behavior of the rank and the maximum number of independent critical sets. In these cases, the threshold effect appears at the points T/n = 1 and T/n = 1/2, respectively. In this section, we consider an intermediate case and obtain a weaker form of the threshold effect. We consider the system of random linear equations in GFB): xh(t) + ---+Xir(t)=bt, t = l,...,T, C.5.1) where i\(t),..., ir(t), t = 1,..., T, are independent identically distributed ran- random variables taking the values 1,...,« with equal probabilities, and the inde- independent random variables b\, .. ., bj do not depend on the left-hand side of the system and take the values 0 and 1 with equal probabilities. If r = 2, we obtain the system considered in Section 3.4. In Section 3.1, we introduced the notions of critical sets for a matrix and hyper- hypercycles for the hypergraph corresponding to a matrix. Denote by Annj the matrix
3.5 Hypercycles and consistency of systems of random equations 157 of system C.5.1) and by G,-Mj the hypergraph with n vertices and T hyperedges e\, ..., ej that corresponds to this matrix. Thus we consider a random hypergraph G>\n.T, whose matrix A = ArMj = \\atj\\ has the following structure. The ele- elements of the matrix atj,t = 1, ..., T, j = 1, ..., n, are random variables and the rows of the matrix are independent. There are r ones allocated to each row: Each 1, independent of the others, is placed in each of n positions with probability 1 /«, and atj equals 1 if there are an odd number of 1 's in position j of row t. Therefore, there are no more than r ones in each row. For such regular hypergraphs, the following threshold property holds: If n, T —> oo such that T/n —> a, then an abrupt change in the behavior of the rank of the matrix Ar,nj occurs while the parameter a. passes the critical value ar. This property can be expressed in terms of the total number of hypercycles in Gr%nj. Let s(Ar<nj) be the maximum number of independent critical sets of Ar,n,T or independent hypercycles of the hypergraph Gr,nj. Then is the total number of critical sets or hypergraphs. In this section, we prove that the following threshold property is true for Theorem 3.5.1. Let r > 3 be fixed, T, n —> oo such that T/n —> a. Then there exists a constant ar such that E.S{Ar^nj) —> Ofor a. < ar and E.S{Ar^nj) —> oo fora > ar. The constant ar is the first component of the vector that is the unique solution of the system of equations e cosh X AtanhA. = x, with respect to the variables a, x, and X. The numerical solution of the system of equations gives us the following values of the critical constants: a3 = 0.8894 ..., a4 = 0.9671..., a5 = 0.9891 ..., a6 = 0.9969 ..., a7 = 0.9986 ..., a8 = 0.9995 ....
158 Systems of random linear equations in GFB) Expanding the solution of the system into powers of e~r yields e-r ar « 1 1 r , Iog2 log2V2 Iog2 2) which gives values close to the exact ones for r > 4. Let us give some auxiliary results that will be needed for the proof of Theo- Theorem 3.5.1. The total number of hypercycles S{Ar,nj) in the hypergraph Gr^nj with the matrix Ar^nj can be represented as a sum of indicators. Let ?/,,...,/„, = 1 if the hypercycle C = {et{, ..., etm} occurs in Gr,nj, and ?/,,...,/„, = 0 otherwise. It is clear that P {?/,,...,/„, = 1} does not depend on the indices t\, ..., tm. Indeed, from the definition of the random hypergraph Gr<n<r, the indicator ?/,,...,/„, = 1 if and only if there are an even number of l's in each column of the submatrix consisting of the rows with indices t\,..., tm. The number of 1 's in n columns of any m rows, before these numbers were reduced modulo 2, have the multinomial distribution with rm trials and n equiprobable outcomes. Denote by ?7i(s,«),..., r]n(s, n) the contents of the cells in the equiproba- equiprobable scheme of allocating 5 particles into n cells. In these notations, the number of 1 's in the columns of any m rows, before those numbers have been reduced modulo 2, have a distribution that coincides with the distribution of the variables r)\(rm, «),..., r]n(rm, n). Therefore P{&i,...,rm = 1} = P{rii(rm,n) e ?,..., r}n(rm,n) G E], where E is the set of even numbers, and the average number of hypercycles in Gr,nj can be written in the following form: ES(Ar,nJ) = Y] (T)pE(rm,n), C.5.3) where PE(rm,n) = P{ri\(rm,n) G E,... ,r]n(rm, n) G E}. Thus, to estimate ES(Arnj), we need to know the asymptotic behavior of PE{rm,n). We consider a more general case and obtain the asymptotic behavior of the probabilities PR{s,n) = P{rn(s,n) G R,... ,r]n(s, n) e R], where R is a subset of the set of all nonnegative integers. The joint distribution of the random variables 771E,«),..., r]n(s, n) can be expressed as a conditional distribution of independent random variables ?1, ...,?„, identically distributed by the Poisson law with an arbitrary parameter X, in the
3.5 Hypercycles and consistency of systems of random equations 159 following way (see, e.g., [90]). For any nonnegative integers si, ..., sn such that s\ H + sn = s, P{r)i(s, n) = 5i, ... ,r]n(s,n) = sn} = P{?, = Sl, ...,$„= Sn \ Si +¦¦¦+ %„ = S}. Therefore PR(s,n) = P{m(s,n) e R,...,T]n(s,n) e R} ¦¦¦+$« = eR})n We now introduce independent identically distributed random variables ;| ,...,?„ with the distribution = k) = P^i = * I ?i e *}, * = 0, 1,.... It is not difficult to see that P(?i +•¦¦+?„= 5 | ?i e *,...,?„ e *} = P[$[R) + ¦¦¦+ ^R) = s], and therefore PR(s,n) = (Pffc e R})* [l)t T T;" " }. C-5.4) P{^ +-..+^ =5} Let x = s/n and choose the parameter X of the Poisson distribution in such a way that x = t-$i = keR Let d be the maximum span of the lattice on which the set R is situated and denote the lattice by Tr . Theorem 3.5.2. If s,n —> oo such that n G Tr, then in any interval of the form 0 < xq < x < x\ < oo, xxex\" d fx ) 5L ) Xxex 1 a uniformly in x = s/n, where the parameter X of the Poisson distribution of the random variable ?i is the root of the equation x = E^| , and a2 = D?| (the variance).
160 Systems of random linear equations in GFB) Proof. The local limit theorem holds for the sum %\R) + ¦ ¦ ¦ + ^R). Following the classical proof of the local limit theorem of Gnedenko [49], we prove that if 5, n -> oo such that n e Tr, then rtn uniformly in x = s/n in any interval of the form 0 < xq < x < x\ < oo, where a1 = D?[ \ and d is the span of the lattice Tr. When we substitute the expression into C.5.4) and take into account that the sum ?i + ¦ ¦ ¦ + ?„ is distributed by the Poisson law with parameter Xn, we obtain the assertion of the theorem. ¦ Note that C.5.4) implies the estimate PR(s,n) <(P{h eR})n s\eXn where Pr(s,h) does not depend on X, and on the right-hand side any positive value can be assigned to this parameter. Let E = {0, 2,...}. In this case, g E} = e'xcoshX, and the estimate takes the form PE(s,n)< (cosh X)n-^-, C.5.5) Xsns where X > 0 can be chosen arbitrarily. We now estimate rj m = \ Lemma 3.5.1. Ifr > 3 is fixed, and T, n —> oo such that T/n —> a, then for any s > 0, there exists-8 > 0 such that Proof. First we point out that 1 2k = 2*} = PKl = 2k | f, e E) = m^rx, . = 0,1,..., = AtanhA.
3.5 Hypercycles and consistency of systems of random equations 161 Put x = rm/n and choose the parameter X of the Poisson distribution in such a way that x = X tanh A.. From C.5.5), it follows that (rm)\ PE{rm,n) < (coshA)" Xrmnrm Since the value of x becomes small for sufficiently small 8, we can assume that X < 1 in the domain of summation. For such X, and therefore X2/4 <x = X tanh X < A2, cosh A. < ex < eAx. We now estimate the sum. It is easy to see that \m)')s E Km<ST V 7 Km<ST l<m<ST Tme4xn Km<ST x 7 m \<m<ST r\rl2 -) rr'2-leAr8r'2-1 Since T/n tends to a constant, the last sum can be made arbitrarily small by choosing a sufficiently small 8. ¦ Lemma 3.5.2. If r is fixed, and T, n —> oo such that T/n —> a, 0 < a < 1, then for any s > 0, there exists 8 > 0 swc/z Y^ 0 PE(rm,n) < s. Proof. Put X = rm/n and let an integer mo be chosen such that mo/T < 8. With such a choice of A., by C.5.5), ^ )PE(rm,n)< ^
162 Systems of random linear equations in GFB) Since in the domain of summation, k is greater than some positive constant, there exists q < 1 such that e-xcoshk = A +e~2x)/2 <q. By using the inequality (rm)\ < c(rm)rme~rm(rT)l/2, where c is a constant, we obtain ]T ( )PE(rm,n) < c(rT){/2 ]T ( T\e~xcoshk)n m=T-mo m=T-m0 ^ ' x—* I * \Q \i — Q) n \ II ; m=T-mo / a \"~T <c{rT)x'2{ mo/in-T) ) Since q, a < 1, the value mo/(n — T) can be made arbitrarily small by choosing a sufficiently small 8, and therefore the value q/{\ — q)mo/(n-T) can be made smaller than some Q < 1. Thus, for a sufficiently small 8, the right-hand side tends to zero under the conditions of the lemma. ¦ Proof of Theorem 3.5.1. We now estimate the middle part of the sum. As T/n -> a and 8 < m/T < 1 — 8, the values x = rm/n lie in an interval of the form 0 < xq < x < x\ < oo. When we apply Theorem 3.5.2, we obtain for even rm, PE(rm,n) = (P{^ e E})n (?- ^ e ke/ a uniformly in x, xo < x < x\, where x — E%[E) = ktanhk, a2 = D^(?) = k2+x-x2. From P{^i g E] = e~x coshk, we obtain the final estimate: As T, n -> oo, T/n -* a, PE(rm,n) = (coshk)n (^-Y" —A + o(l)) C.5.6) uniformly in m, 8 < m/T < 1 — 8. Setting p = m/T, q = I — p, and using the normal approximation to the binomial distribution show that, as T -> oo, (T) = (T)pm<iT-m(pm<iT-mr{ = m t \ \mj \mj pmqT- uniformly in m, 8 < m/T < 1 — 8.
3.5 Hypercycles and consistency of systems of random equations 163 Let a = T/n and write p = m/T in terms of x = rm/n and a. Then m x m ar — x T ar' T ar and the estimate of (^) takes the following form. AsT^oo,8<m/T< 1-5, m ar f(arr>r-x)*y/'- = —===== — —- I (l+o(l)) C.5.7) V»/ ^2nx(ar - x)an \ xx(ar-x)ar ) uniformly in m. We combine the estimates C.5.6) and C.5.7) and obtain 0 larJx PE(rm,n) = (f(a,x))nV — x)an where f(a,x) = x\* [ar-x\x/r { ar \a \ x / \ar — x J x = A-tanhA.. The function f(a, x) increases as a increases, I'x far -x\1/r\ f'x(a,x) = f(a,x)\ogl-l—— J j^-<x), x -* 0, and the derivative f'x(a, x) has no more than two zeros. Therefore the system of equations f(a,x) = 1, f'x(a,x) = 0, C.5.8) A.tanhA. = x has the unique solution {ar, xr, Xr); at this point, the function f(ar, x) as a function of x attains its maximum, which is equal to 1. Therefore, for all x, 0 < x < ar, f(ctr,x) < f{ar,xr) = 1. In addition, f(a,x) < f(ar,x) < 1, a < ar, f{a,xr) > f{ar,xr) =1, a > ar. This implies that the middle part of the sum tends to zero for a < ar and tends to infinity fora > ar.
164 Systems of random linear equations in GFB) If we consider the estimates for the tails of the sum in Lemmas 3.5.1 and 3.5.2, we obtain the assertion of Theorem 3.5.1 because system C.5.8) can be easily transformed to the form mentioned in the statement of the theorem. ¦ It would be interesting to find the limit distribution of the number of hypercycles. Up to now, no one has succeeded even in proving that S{Arnj) tends in probability to infinity as T, n -» oo, T/n -> a > ar. 3.6. Reconstructing the true solution We consider the system of equations in GFB): = bt, t=l,...,T, C.6.1) where the pairs (i(t), j(t)), t = 1,..., T, are independent identically distributed two-dimensional random vectors that take values (/, j), i < j, i, j — 1,...,«, with equal probabilities () In Section 3.1, we interpreted a system similar to C.6.1) as the result of T trials performed with the aim of classifying n objects by random pairwise comparisons, and we set bt = 0 if the comparison of *,-(,) and Xj(t) showed that these objects were from the same class, and bt = 1 otherwise, for t = 1,..., T. If the compar- comparisons are not absolutely right, then the result of a comparison may deviate from the true value. Suppose that X* = (x*,..., x*) is the vector of true values of the unknowns, and the column-vector B* — (b*,..., b^) is obtained by substituting X* into the left-hand side of system C.6.1): AX* = B*, C.6.2) where A is the matrix of system C.6.1). If the measurements are not precise, then it is natural to suppose that bt = b* + st, t = 1,..., T, where s\,.. .,st are independent identically distributed random variables that do not depend on A and take the values 0 and 1. These random variables can be interpreted as errors. Let ]^ = l}, q = l-^- = P{?i = 0}, C.6.3) where A is called the excess. The problem is to estimate or reconstruct the vector X* = (X*,..., jc*) on the basis of the matrix A and the right-hand side B = (b\,..., bj) of system C.6.1). In a similar situation over the field of real numbers, an estimate of the true solution of a system of linear equations with perturbed right-hand sides can be found by the least-square method. Under some conditions on the matrix and the
3.6 Reconstructing the true solution 165 errors in the right-hand sides, the least-square method provides an estimate that converges to the true solution as the number of equations tends to infinity. In contrast to the field of real numbers, in GFB) a good estimate X = (x\,..., xn) coincides with the true solution X* = (jc*, ..., X*) with probability tending to 1 as T -> oo. As usual, we associate the graph Tnj with the left-hand side of system C.6.1). The graph TnT has n labeled vertices corresponding to the unknowns jci, ..., xn and T edges et = (i(t), j(t)), t = 1,..., T. The edges e\,..., ej are independent and assume the /?(/? — l)/2 possible values with equal probability. Therefore, the graph Tnj may have multiple edges. It is clear that along with the vector X*, the vector X* = (jc*, ..., jc*) with elements jc* = jc* + 1, t = 1,..., /?, satisfies the system C.6.2). The pair X*, X* is uniquely determined by the system C.6.2) if the graph Tnj is connected, in other words, if the system C.6.2) contains all the unknowns and is not decomposed into subsystems with disjoint sets of unknowns. Denote by pnj the probability that the graph Tnj is connected. It follows from Theorem 2.3.8 that if n, T -» oo such that T = /?log/? +an +o(n), where a is a constant, then -e~a Pnj -*- e Thus, if n, T —> oo and the pair X*, X* is determined by the system C.6.2) with probability tending to 1, then T in n\ogn log/?' where wn —> oo. In this section, we present three algorithms for reconstructing the true solution of system C.6.1) with perturbed right-hand sides. We first describe the reconstruc- reconstruction method that can be called the voting algorithm. This algorithm consists of correcting the right-hand sides b\, ..., bj of the system C.6.1) by the majority rule. Let the system C.6.1) contain the subsystem with /w,-y, / < j, equations: Xi + Xj = atj , C.6.4) The true value of a\ ¦ , ..., a™.1' equals a*. = jc* + jc*. We set atj = 1 if and aij = 0 otherwise.
166 Systems of random linear equations in GFB) Under some conditions, system C.6.1) is indecomposable and a(/ = a*- for all i, j = 1,...,«; thus the true solution is reconstructed. Denote by P(n, T) the probability of reconstructing the true solution of system C.6.1) by the voting algorithm, that is, P(n,T) = P{aij=a*j, i, j = 1,...,«). Theorem 3.6.1. Ifn, T —> oo and A —> 0 such that A2T n2 log/? then P{n, T) -* 1. Proof. Let i, T) = min oo, where the minimum is over all subsystems of the form C.6.4). It is clear that P(n, T) = P{fi(n, T) > m}Pm(n, T) + P{/x(n, T) < m}Pm(n, T), C.6.5) where Pm(n, T) and Pm(n, T) are the conditional probabilities of reconstruct- reconstructing the true solution under the conditions {//,(«, T) > m] and {/x(n, T) < m}, respectively. We obtain a rough estimate for the probability P{n(n, T) > m}. It is clear that P{n{n, T) > m} is the probability that each cell contains more than m particles in the classical scheme of allocating T particles into B) cells. Denote by rn the number of particles in the ith cell and put ?,• = 1 if m < m, and ?,• = 0 if m > m, i = l,...,g). By A.1.1), P{fi(n, T)<m} = P{^i + • • • + ^q > 0} The random variable 771 has the binomial distribution with T trials and the probability of success Q)" . Since a = E771 = T/{^) -> 00, the normal ap- approximation is valid for this distribution. We choose m = a{\ — A), assume that (As/aK/ T -> 0, and estimate the probability P{^i < m}. By taking into account the choice of m and the equality D771 = a(l + o{\)), we obtain P{/71 < n] = P{(m - a)/fihH < (m - a)/y / e~u '2du{\+o(\)). J-oq
3.6 Reconstructing the true solution 167 Hence, there exists a constant c such that P{/7i <m) <ce-A2a/2. Thus, for m = a( 1 - A), ,T)<m}^»0 C.6.6) because A2a/ log/? -» oo and n2e~Aa/2 -» 0. Now we have to show that under the conditions of the theorem, Pm(n,T) -» 1. In other words, we have to prove that aij = a*, for all i, j = 1,..., n with probability tending to 1. The additional requirement of the indecomposability of the system C.6.1) or of the connectedness of the graph Tnj is obviously fulfilled. Recall that bt = bf + st. We may assume that in the subsystem C.6.4), aW=atj+e%\ *= 1,...,«,;, where the random variables s] ¦',..., ei™'J are independent and have the same distribution as ei,..., ej from the right-hand side of C.6.1). Denote by ?(«, T) the number of wrong decisions, that is, the number of realized events {aij ^ a*.}, i,j=l,...,n. Now let fy = 1 if ejP + • • • + ef}ij) > mo/2, and ^7 = 0 otherwise. It is clear that the number of wrong decisions can be represented in the form and 1 - Pm{n, T) = P{?(«, T) > 0 | fi(n, T) > m) B) = 1 \nin,T)>m}. C.6.7) Now we derive estimates for P(?i2 = 1 I ti(n, T)>m} = P{e$ +¦¦¦+ e(™l2) > mn/2 \ fi(n, T) > m). The random variables s\2 ,. •., s^ are independent and have the same distribu- distribution as the random variables ei,... ,?t from the right-hand side of system C.6.1). We set Sk = s\ H \- ?k and estimate P{Sk > k/2} = P{Sk - ESk > kA/2}. Here, and later in this section, we use the following inequality of exponential type for the sum Sk that was proposed by Hoeffding [59] and can be found in [122] (see
168 Systems of random linear equations in GFB) Theorem 1.1.16). For any positive A, P[Sk-ESk> kA/2} <e~kA2/2. C.6.8) Therefore and from C.6.7), we obtain ("XA2/2 C.6.9) Form = a(l - A), a = T/Q), under the conditions of Theorem 3.6.1, the right- hand side of C.6.9) tends to zero. Thus, the assertion of the theorem now follows from C.6.5), C.6.6), and C.6.9). ¦ We now describe the second algorithm for reconstructing the true solution of system C.6.1), which can be called the method of coordinate testing. We choose a vector X^ = (x[ \ ..., ^0)) by random sampling from the set of all n -dimensional vectors over GFB). Denote by B{0) = (bf\ ...,bf}) the column-vector obtained by substituting X^ for X in the left-hand side of C.6.1). Let p(X^) be the number of the coordinates of 5@) that coincide with the corresponding coordinates of the vector B = (b\,..., bj) of the right-hand sides of system C.6.1). We construct a vector X(l) = (x[1},..., x^}) from X@) and system C.6.1) and show that, with probability tending to 1, the vector coincides with the true solution X*. Therefore we consider the vectors W) @) Q @) @) (P) @) 1 @) @) and calculate the values P(Xiyo) and fi(Xiti), defined for the vectors Xito and Xiy in the same way yS(X@)) was defined for X{0). For i = 1,...,«, let Denote by %(X) the number of coordinates of the vectors X and X* that coin- coincide. The value where X = {x\,... ,xn) = (x\ + 1,..., xn + 1), is called the number of coinci- coincidences.
3.6 Reconstructing the true solution 169 Lemma 3.6.1. If n —> oo, then the distribution of the random variable Bt](X^)) — ri)lsfn converges weakly to the distribution of the modulus of the random variable that has the normal distribution with parameters @, 1). Proof. Since the vector A^0) is chosen from the set of all ^-dimensional vectors by random sampling with equal probabilities, the random variable Sn = ?(A^0)) has the binomial distribution with parameters («, 1/2). From the obvious equality ?j(X) + ?j(X) = n, the random variable ij(X^) is represented in the form It is clear that - n) = max -=^, -=- \2Sn -Al|, and the assertion of Lemma 3.6.1 follows from the convergence of the distribution of {2Sn — n)/sfn to the normal distribution with parameter @, 1). ¦ We can now prove the following assertion concerning the algorithm of coordi- coordinate testing. Theorem 3.6.2. Ifn, T -> oo and A -> 0 such that A2T > nz\ogn then P{X{1) =X*} Proof. For definiteness, assume oo, The coordinates of X^ that coincide with the corresponding coordinates of X* are called true, whereas those that do not coincide are called wrong. For the algorithm of coordinate testing to lead to the true solution, the following obvious conditions must be fulfilled. For each coordinate of the vector X®\ the value of fi(X^) must increase if we replace the wrong value of the coordinate by the true value, and the value of fi(X^) must strictly decrease if we replace the true value by the wrong one. We separate all the equations of the system C.6.1) that contain jc,- , and denote the number of such equations by «,-. Replacing x( by x( changes the contribution in fi(X^) of these equations only, and each equation containing jc,- contributes
170 Systems of random linear equations in GFB) 1 or -1. If jc(-0) is wrong, then the increment of /}(X@)) due to replacing x\ by xf] is equal to the random variable fr(X{0)) such that (/J,(X@)) + «,)/2 has the binomial distribution with parameters («,, /?,), where /?,- is the probability that the coincidence in a fixed equation containing x, appears after substituting x\ for xf\ provided x\ ) is wrong. It is not difficult to see that Pi = vq + (l-v)p, C.6.10) whereg = P{b, = b*},p = 1 — q, and v is the probability that the second variable in the equation has the true value. The second variable takes values from the set with equal probabilities. Therefore v = (k - l)/(« - 1), where k is the num- number of true coordinates of X^\ which equals ^(X^) under the assumption that > HX{0)). It follows from Lemma 3.6.1 and equality C.6.10) that which we write as Pi = - + -—7=r, C.6.11) 2 2v« where n-\ By assumption, ?„ is asymptotically normal with parameters @, 1). Therefore > (A27/(«2log«))/4} -> 1 C.6.12) because A2T/(n2\ogn) -> oo. Next, we find a lower bound for m, i = 1,...,«. To this end, we take into account only the first variable in each equation. Then we obtain the classical scheme of equiprobable allocation of T particles into n cells, and by applying the corresponding results on the distribution of the minimum of contents of cells [90], we find that Pf min m > T/Bn)\ -> 1. C.6.13) I \<i<n J
3.6 Reconstructing the true solution 171 For the increments &(X( '), we have P[^(X(O))<O\xii°) is wrong) = P{(t;(X{0)) + m)/2 < mil | xjo) is wrong) = P{SB, <ru/2), where S,u has the binomial distribution with parameters («,-, pi). From C.6.11), we find that P{SB, < m/2) = P[Sni - ESni < A|?B|/!//Bv^)}. When we use estimate C.6.8) of the exponential type for the binomial distribution and take into account C.6.12) and C.6.13), we obtain x-l/2 P{Snt < m/2} < exp j n2 \n2 log/i In a similar way, we obtain the bound > 0 | jc/0) is true) < exp - A2T ( A2T -1/2 n2 \«2log« Therefore an upper estimate for the probability of at least one wrong decision while testing all the coordinates of the vector X^ is i=\ < 2n exp < 0 | jcP is wrong) + a 2t / a 2t \ -1/2 > 0 is true}) n2 \n2\ognj and tends to zero under the conditions of the theorem because A2T/(n2\ogn) -> oo. With the help of a preliminary search of the n -dimensional vectors, it is possible to select an initial vector X^ with a great number t]{X^) of coordinates coincid- coinciding with the corresponding coordinates of the true solution X*.\f the algorithm for coordinate testing begins with this initial vector, then a much smaller number of equations is needed to reconstruct the true solution. This number is comparable to the number of edges needed for the graph Tnj to be connected. Theorem 3.6.3. Ifn, T -> oo and A -> 0 such that A2T n log/? oo,
172 Systems of random linear equations in GFB) then there exists an algorithm that reconstructs the true solution of system C.6.1) with probability tending to 1. Proof. The algorithm, which gives the true solution under the conditions of the theorem, begins with a preliminary search of an initial vector Ar@) with a large number of coincidences with the true vector X*. The choice of X^ is determined by a search of all ^-dimensional vectors. To this end, we choose the level / = Tq-uTVf, where q = P{bt = bf} = A + A)/2andur = A^T/18, and select the vectors Xfor which fi(X) > /. Recall that fi(X) is the number of coincident coordinates of the vector B = {b\,..., bj) and the vector of the right-hand sides of system C.6.1) that are obtained when Xis substituted into the left-hand side of the system. The vector X* will be selected with probability tending to 1. Indeed, P{P(X*) <Tq- utVt] = P{ST - where Sj is the number of successes in T independent trials with the probability of success equal to q = A + A)/2. By using estimate C.6.8), we find that P{P(X*) <l}<e~u2T/2, and the complementary probability P{/3(X*) > /} ->¦ 1 because uj ->¦ oo. If %(X) = s, then the probability of the coincidence of a fixed component of the right-hand sides is qs(s-\) q(n - s)(n - s - 1) 2s{\ - q){n - s) P(s) = — — + + , n(n — 1) n(n — 1) n(n — 1) and, since q = A + A)/2, we find 1 ABs - n)Bs - n + 1) P(s) = - H — . 2 2n(n — 1) For example, let s <2n/3. Then p(s) < 1/2 + A/9, beginning with some n, and for any fixed X with %(X) = s < 2n/3, > /} = P{ST >Tq- u < P{ST - EST > 7AT/18 - uTVf] = P{ST - EST > Ar/3}, where Sr is the number of successes in T independent trials with probability p{s) of success. By using the inequality C.6.8) of exponential type, we find
3.6 Reconstructing the true solution 173 The probability that none of the vectors Xwith i-(X) < 2n/3 will be selected does not exceed 2"e~A'T^lH, and under the conditions of the theorem this prob- probability tends to zero. Thus, with the help of the exhaustive search, it is possible to select, with probability tending to 1, a vector Ar@) such that ^(l'0') > 2n/3. Beginning the algorithm for coordinate testing with this vector X^\ we find, using the notations introduced in the proof of Theorem 3.6.2, that P(?(*@)) < 0 | x(@) is wrong) = P{Snj - ESni < -A\^n\ni/B^n')}. Using estimate C.6.8) and taking into account that with probability tending to 1, l?nl > ¦y/n/'i for the selected vector and «,¦ > T/{2n), we find the estimate < 0 | xf is wrong) < P{Snj - ESni < -Am/6) Similarly we obtain > 0 | x™ is true) < As in the proof of Theorem 3.6.2, an upper bound for the probability of at least one wrong decision, while all n coordinates of X^ are tested, is 2ne~A r/C6") and tends to zero under the conditions of the theorem. ¦ Thus, if we use the exhaustive search, then the true solution can be reconstructed under the condition A2T/(n \ogn) -> oo. If the number of equations T is such that A T/(n log/?) —> oo, then the reconstruction can be realized by the voting algorithm, which is more economical with respect to the number of operations. Clearly, there is considerable interest in the algorithms that lead to the true solution with probability tending to 1 under intermediate conditions on the number of equations T and do not require the exhaustive search of all 2" vectors. Let us describe an algorithm that will be referred to as A^. Consider all B) equations obtained as the pairwise unions of the equations of the system C.6.1). Among the equations obtained by this operation, there are equations that contain either four, or two, or zero unknowns each. Denote by 62 the subsystem that includes all the equations with two unknowns each. The algorithm A2 ends with the application of the voting algorithm to the subsystem S2. The following theorem gives the conditions under which the algorithm A2 reconstructs the true solution. Theorem 3.6.4. Ifn,T-^co and A ->¦ 0 such that A4T2 > 00, then the algorithm A2 reconstructs the true solution with probability tending to 1.
174 Systems of random linear equations in GFB) Proof. Let i and / be arbitrary, assume i < /', and consider all equations of the system S2 of the form C.6.14) + Xj = b(j The equality m/y = m means that the graph Vnj corresponding to system C.6.1) contains exactly m vertices, say v\,..., vm, such that the graph Tnj contains the edges (v\, i), {v\, j),..., (vm, i), (vm, j). The right-hand sides b\j ,..., btjlj are the pairwise sums of 2m ,j independent random variables, and therefore they are independent and, according to Lemma 3.2.1, take the true value b*. = x* + xj with probability A + A2)/2 and the wrong value with probability A - A2)/2. Let bij = 1 if b^ H + bf^'j) > /w/y/2, and bi} = 0 otherwise. As in the proof of Theorem 3.6.1, we denote by fi(n, T) the minimum value of /Ho- /Hoover all subsystems of the form C.6.14). As in C.6.5), the probability P(n, T) of reconstructing the true solution can be represented in the form P(n, T) = P{fx(n, T) > m}Pm(n, T) + P{^{n, T) < m}Pm{n, T), C.6.15) where Pm(n, T) and Pm(n, T) are the conditional probabilities of reconstructing the true solution by the majority method under the condition that {[x(n, T) > m} and {//.(«, T) < m), respectively. As in the proof of Theorem 3.6.1, we need to estimate P{n(n, T) > m}, but here this estimation is more laborious. Let %ij = 1 if mi j < m, and ?,-_,- = 0 if m,y > m, i < j, i, j = 1, ...,«. It is clear that Let ixi = 1 if the edges (i + 2, 1) and (i + 2, 2) occur in Tnj, and m = 0 otherwise; and v; = 1 if exactly one of the edges (i + 2, 1), (i + 2, 2) occurs in Tnj, and v; = 0 if the edges (/ + 2, 1) and (/ + 2, 2) do not occur in Tnj. The random variable m 12 can be represented as the following sum of indicators: "M2 = Mi H h y.n-2,
3.6 Reconstructing the true solution 175 and m J2 J2 ^i-'*' C-6J7) where />,-, ...,t is the probability that //,;,,..., mk take the value 1 and all the other random variables take the value 0. It is not difficult to see that (Mt, Nt), where Mt = Mi H h Ht> M = vi H h v,, is a Markov chain because (/^+1, vt+ \) depends only on the number of edges used to construct the random variables ix\,..., ixt, v\,..., vt, t = 1,..., n — 2. More precisely, let p(t | y,_i,Z,_i) 7,_i, Z,_i) = P{m, = 0 By using this notation, we can write the probability pix...ik that ^t,-,,..., ^ take the value 1 and all the other random variables take the value 0 in the form Pi\...ik = M = 0, / # /"i,..., ik I vi = z\,..., vn^2 = zn-2) = q(\ | Y0,Z0)---q(h-\ where Zo = 7o = 0, Zt = z\ + ¦ • • + zt, and Yt is the number of i\, ..., i^ do not exceed t. We now estimate the probabilities p{t \ Y, Z) and q{t \ Y,Z). It is clear that p(t | 7, Z) + #(f | 7, Z) = 1, and the probability p{t \ 7, Z) does not depend on t and equals the probability pi{s, N) that two fixed places corresponding to the edge A, t), B, t) will be occupied after allocating s = T — 27 — Z edges into N = B) — Z places in the classical scheme of allocation of particles. Therefore s\ ( 2\s~2 ^ N
176 Systems of random linear equations in GFB) and we have the following estimates: s(s-\)/ 2V~2 »*.»> s(s-\) ^ (j-2)! / _ 2_ TV2 /^ k\ll(s-k-l-2)\Nk+l \ N IV _ d _ 1 C, V + V N) JV2 Since T-3n<s = T-Y-Z<T, n(n - 3)/2 < N = (j - Z < n(n - we obtain for all & = 0, 1,... ,n — 2, n- ¦ < Pknn~k~2 Pi\...ik 5: -r y , where P = max^(/ | Yt-\, Zt-\), Q = l-min/>(f I ^-l, ^-i). Therefore it follows from C.6.16) and C.6.17) that P{mi2 < m) < (P + Q)"-2P{h +¦¦¦+ $n-2 < m}t where ?i,..., %n-2 are independent identically distributed random variables, P{?i = 1} = P/(P + 0, P{^ = 0} = Q/(P + 0. As«,r^ ooandT/g) -> 0, P 47 and under the conditions of the theorem, (P + 0"-2= l+o(l). C.6.18) The random variable ?w-2 = ?i +•••+?„-2 has the binomial distribution with parameters (« - 2, P/{P + 0).
3.7 Notes and references 177 Let a = EC_2 = (n -2)P/(P + Q) andm = a(l - A2). We assume that is not too large, so that A4T2/nlQ/3 -> 0. Then, for sufficiently large n, ,-A2>/2 ./—00 '2jt J -00 and there exists a constant c such that 2 < m} < ce-A4a/8. C.6.19) Thus, by virtue of C.6.16), C.6.18), and C.6.19), P{fj,(n, T)<m} < cn2e-A4a/s -* 0 C.6.20) because, under the conditions of the theorem, &4T2/(n3 log n) ->¦ oo, and conse- consequently, n2e~A^8 -> 0. As in the proof of Theorem 3.6.1, we have to show that under the conditions of Theorem 3.6.4, the system 62 is indecomposable and Pin, T) —> 1. In other words, we have to show that bij = b*- for all /, _/ = 1,..., n with probability tending to 1. By the same reasoning as in the proof of Theorem 3.6.1, for m = a{\ — A2), we obtain l-Pm(n,T)<("X-m*4'4, C.6.21) and under the conditions of Theorem 3.6.4, the right-hand side of C.6.21) tends to zero. The assertion of the theorem follows from C.6.15), C.6.20), and C.6.21). ¦ 3.7. Notes and references The theory of systems of random equations in finite fields was developed by the Russian mathematicians V. E. Stepanov, G. V. Balakin, I. N. Kovalenko, A. A. Lev- itskaya, and others. The connection between systems of equations in GFB) and graphs was first pointed out and used by Stepanov. The notion of a critical set was introduced in [79] (see also [13] and [85]). The theory of recurring sequences and shift registers mentioned in Section 3.1 can be found in [50] and [156]. Theorems 3.2.1 and 3.2.2 were proved by Kovalenko in [92]. This brilliant result initiated a series of investigations of similar problems that were carried out by Kovalenko and his school. These investigations developed in two directions. The first direction concerns extensions of Theorems 3.2.1 and 3.2.2 to matrices
178 Systems of random linear equations in GFB) over more general algebraic structures. It is not difficult to see that by virtue of the Markovian character of the process p,,(t), a recurrence relation for pn,T{k) = P{pn(T) = k} can be derived and used for the proof of Theorem 3.2.1. In this way, the extension of the result to a finite field with q elements can be easily obtained [93]. Let the elements of T x n matrix A = \\atJ-\\ in GF(q) take the values 0, 1, ...,</— 1 with equal probabilities, then the pnj(k) for any k = 0, I,... satisfy the equation pn j (k) = zn pnj-1(*) + A -zn)pn-\j-\(k), C.7.1) where z = \/q. Indeed, if the first row of A is a zero vector, then pn(T) = pn(T — 1), and if the row contains at least one nonzero element, then pn(J) = pn-\{T — 1) + 1. It follows from C.7.1) that if s > 0 and m are fixed integers, m + s > 0, n —> oo, and T = n + m, then oo / 1 v m-\-s / 1 v — 1 r[Pn(.-t) — n — s) —> q 111 /"/111 J } " w-'-?) i=s+\ ^ ^ ' i=\ The investigations in the second direction concern the bounds of invariance of the results of Theorems 3.2.1 and 3.2.2 with respect to the deviations of the distribution of elements of the matrix A from the equiprobable distribution. The problem of the invariance and a proof of Theorem 3.2.3 are given in [91, 92]. A modified proof of Theorem 3.2.3 is contained in [93]. Theorem 3.2.4 can be easily extended to any moment of a fixed order of the number of solutions, but that is not sufficient for the proof of the invariance property, since the limit distribution C.7.2) does not satisfy the sufficient conditions of the unique reconstruction by its moments; hence, Theorem 1.1.3 cannot be applied. Levitskaya [96,97] presents results on the number of solutions of linear random systems over arbitrary rings and the corresponding results on the invariance of the moment and the limit distributions. These results are summarized in [93], where, in particular, the exact bounds for the invariance are given for random linear systems in arbitrary finite rings. For the system considered in Theorem 3.2.3, the exact bounds for pry have the form K < P}j < I —0n, where 8n = (\ogn + xn)/n and xn —> oo arbitrarily slowly as n —> oo. Matrices that satisfy condition C.3.1) were considered by Balakin [12], who also proved Theorems 3.3.1 and 3.3.2. Closer investigation of the estimates used in our proof of Theorem 3.3.1 allows us to obtain the following assertions. Theorem 3.7.1. If n —> oo, T = n + pn\ogn,
3.7 Notes and references 179 {}„ -+ -oo, fin = o(n/\ogn), and condition C.3.1) holds, then the distribution ofs(A) converges to the Poisson distribution with parameter e~x. Theorem 3.7.2. Ifn —> oo, T = n +p\ogn + o(\ogn), fi is a constant, and condition C.3.1) holds, then the distribution ofs(A) converges to the Poisson distribution with parameter e~~x if f5 < 0, and with parameter e~x^ if? > 0. Theorems 3.3.1, 3.3.2, 3.7.1, and 3.7.2 give a complete description of the be- behavior of the rank of such matrices, except for the case fi = 0, where the behavior is unknown. Note that in [12], the analogues of Theorems 3.3.1, 3.7.1, and 3.7.2 are proved for the systems over GF(q), q > 2 (see also [86]), and the connection between the rank of a matrix in GF{q) and other characteristics such as the perma- permanent rank and rank of lines is considered. The initial results on the ranks of random matrices are presented in [38] and [11]. Stepanov began investigating systems of linear equations of the form C.4.1) with the help of their relations to random graphs. In particular, he proved The- Theorems 3.4.1 and 3.4.2. Now the theory of random graphs provides a basis for obtaining the results on the systems of random equations with coefficients taking their values with equal probabilities. If the coefficients of a system are essentially nonequiprobable, then there are no standard approaches to investigating its prop- properties. Only a few results are known for such systems. We remark that at this time, graph theory is not sufficiently developed to answer questions about nonequiprob- nonequiprobable cases. Only the method of moments (see Theorem 1.1.3) and the so-called direct methods are used to solve these problems. Theorem 3.4.3 is a corollary to Theorem 2.4.1 proved in [88] by the method of moments. Theorems 3.4.4 and 3.4.5 are proved in [83]. The asymptotics of the probabil- probability of consistency of a system of linear equations in GFB) (and in more general algebraic structures) with independent random coefficients that take the values 0 and 1 with equal probabilities have been obtained by Levitskaya [98] (see also [93]). This probability takes only two values and is the same for all possible right-hand sides of the system that are not the zero vector. It follows from Theo- Theorems 3.4.4 and 3.4.5 that the probability of consistency of the system C.4.1) de- depends on the number of 1 's in the vector of the right-hand sides of the system (see also [83]). The results of Section 3.5 on the behavior of the probability of consistency of the system C.5.1) can be found in [13] (see also [85]). Theorem 3.5.1 is proved by the author, but the critical values ar were first obtained by Balakin under slightly different assumptions on the matrix Arnj. These results are extended to GF(q) in [89]. The proof of Theorem 3.5.2 is given in [87].
180 Systems of random linear equations in GFB) We can consider the probability of the consistency of a system from the point of view of mathematical statistics. Consider, for example, the system C.4.1) and assume the following two hypotheses on the distribution of the right-hand sides of the system. Let the hypothesis Hq be the existence of a vector X* = (x*, ..., x*), which is interpreted as the true solution of the system, and bt = x*^t) + x*(t), t = 1,..., T. Under hypothesis Ho, system C.4.1) is always consistent. Under the alternative hypothesis H\, the right-hand sides b\,..., bj are independent random variables that are independent of the left-hand side of the system and take the values 0 and 1 with equal probabilities. To distinguish between the hypotheses Hq and H\, we can use the consistency of the system as a test: If the system is consistent, we accept the hypothesis Ho, and we accept H\ otherwise. Therefore the hypothesis Hq is never rejected if it is true, and the error of the first kind, the probability of rejecting Ho if it is true, is zero. The error of the second kind, the probability of accepting Ho if it is wrong, is equal to the probability of consistency of the system C.4.1). Thus, the probability of consistency is the main characteristic in the statistical problem of testing the hypotheses Ho and H\. Section 3.6 is devoted to the other statistical problems that consist of recon- reconstructing the true solution on the basis of a system of random equations with distorted right-hand sides. These results can be found in the paper [84].
Random permutations 4.1. Random permutations and the generalized scheme of allocation Denote by Sn the set of all one-to-one mappings of the set Xn = {1, 2, ...,«} into itself. This set contains n\ elements. We consider a random permutation a that equals any element of Sn with probability (n !)-1. A permutation s e Sn can be written as where Sk is the image of A: under the mapping s,k = \, ... ,n. The mapping s can be represented also by the graph F^) = T{Xn, Wn) whose vertex set is Xn, and the edge set Wn consists of the arcs (k, s^) directed from k to Sk, k = 1,...,«. Since exactly one arc enters each vertex and exactly one arc emanates from each (s) vertex, the graph V), consists of the connected components that are cycles, which are called the cycles of the permutation s. Denote by Vn the random graph corresponding to the random permutation a, which takes the values s with equal probabilities. It is obvious that P{Tn = F« } = (t!). In Section 1.3, we showed that the generalized scheme of allocation intro- introduced in Section 1.2 can be applied to a wide class of problems related to the behavior of the connected components of random graphs. In Example 1.3.1, we showed that the generalized scheme can be used in the study of random permuta- permutations. Recall that in the generalized scheme, we separate the subset of graphs with exactly N components, assign one of the TV! possible orders to the set of these components, and denote by rji, ... ,tjn the sizes of the components. If there exist nonnegative identically distributed random variables ?i, ..., %n 181
182 Random permutations such that for any integers k\ kN, = k\, ..., r]N = kN) = P{?i = k\, . ..,$N = kN | ?1 H D.1.1) we say that the generalized scheme determined by the random variables ?1 ,...,?# is applied to the random graph. As was shown in Example 1.3.1, the generalized scheme that corresponds to the random graph Fn of a random permutation from Sn is determined by the random variables ?1 ,...,?# with the distribution xk P{^=k}= , A: =1,2..., 0<*<l, D.1.2) Hog(l -x) since the number of elements in Sn is an = n! and the number of connected realizations of the random graph Tn is bn = (n — 1)!. For the random permutations, the corresponding generating functions have the form 00 „ , A(x) = n\ 1 -x n=0 n=0 n' Thus the study of various characteristics of random permutations can be ac- accomplished with the help of the generalized scheme. This is demonstrated for the most part in [78]. Recall some combinatorial identities that follow from the general results of Section 1.3. Let vn be the number of cycles in a random permutation from Sn. Lemma 1.3.3 gives the equality P{vn =N}= (B^n P{?i + ... + ^ = «}. D.1.3) Denote by ar the number of cycles of length r in a random permutation from Sn, r = \,...,n. According to Lemma 1.3.7, for any nonnegative integers mi,.. .,mn, ¦" 1 P{ai=mu...,an=mn} = Tl— D.1.4) i l rmrmr\ r=\ if m i + 2m2 + ¦ • • + nmn = n, and the probability is zero otherwise.
4.2 The number of cycles 183 Let us introduce the generating function 00 =m\,...,an=mn}t'"] ¦¦¦{"" m\,...,mn = ? \ n where the summation is over the set of integers Mn = {mi > 0, i = 1, ..., n, m\ + 2m2 + ¦ ¦ ¦ + nmn = n). Put <po = 0. It is not difficult to see that <pn(t\,..., tn) is the coefficient of un in the expansion of e\p{ut\ + u2t2/2 + •••}: 00 = exp - n=0 oo untn n n=\ D.1.5) The generating function D.1.5) was obtained by Goncharov and was the basis of his pioneering investigations of random permutations [53]. In [78], the approach based on the generalized scheme of allocations was used in such investigations. In the next sections, we will present some examples of how the generalized scheme of allocation can be applied to random permutations. This will supplement the investigations presented in [78]. 4.2. The number of cycles It is well known that the number of cycles vn in a random permutation from Sn is asymptotically normal with parameters (log«, log«) as n ->¦ oo. More precisely, as n oo, P{vn = N}= y/27t log n D.2.1) uniformly in the integers N such that u = (N — \ogn)/^/\ogn lies in any fixed finite interval. The approach based on the generalized scheme of allocation makes it possible to obtain the asymptotics of the probability P{vn = N) for all possible values of N = N(n) as n ->¦ oo. According to D.1.3), for any integer N, P[Vn = N)= (— N\x" where the parameter x can be taken arbitrarily from the interval @, 1), and ?i, ..., %n are independent identically distributed random variables with distribution D.1.2).
184 Random permutations Thus, to study the asymptotic behavior of the distribution of vn, it is sufficient to obtain the corresponding local limit theorems for the sum $N = §1 H + %N, where the parameter x in the distribution of the summands can be chosen so that obtaining the local theorems becomes simple. We begin with x = 1 — \/n and prove a series of limit theorems that make it possible to describe the behavior of the probability P{vn = N} for the values of TV not too far from log n. Theorem 4.2.1. If n ^ oo, N = y \ogn + o(logn), where y is a constant, 0 < y < oo, then = k} = Ll ze(l + nT{y) uniformly in the integers k such that z = k/n lies in any interval of the form 0 < zq < z < z\ and zq and z\ are constants. Before proving the theorem, we obtain some auxiliary results. We have chosen x = 1 — \/n. For such x, (l~l/n) , ?=1,2,..., D.2.3) *} k\ogn and the characteristic function of the random variable ?i equals (pn(t) ^\og(le + e). \ogn \ n ) Represent <pn (t) in the form (Pn(O = -—i- (log (- - it) + log(l + Vi(O + V2@)) , D-2.4) where elt - 1/n-it n(\/n-it) For ir\(t) and ^2@- the following estimates are valid: D.2.5) |el| 1 < -r1 < -• D.2.6) n t n
4.2 The number of cycles 185 By using the explicit form of <pn(t), the representation D.2.4) and the bounds D.2.5) and D.2.6), we obtain the following estimates of (pn(t). Lemma 4.2.1. If n —> oo, N = ylogn + o(\ogn), where y is a constant, 0 < y < oo, then for any fixed t, *T ( t\ 1 r" \n) {\-ity Lemma 4.2.2. If n ->¦ oo, TV = y\ogn + o(logw), where y is a constant, 0 < y < oo, then there exist positive constants s and c such that for \t/n | < s, Lemma 4.2.3. //"«—> oo, then for 0 < s < \t\ < it, where s is an arbitrary constant, there exists a constant c such that for sufficiently large n, \<pn(t)\ <c/\ogn. Lemma 4.2.4. If n ^ oo, then there exists a positive constant s such that for \t/n\ < s, 2 As follows from Lemma 4.2.1, as n —> oo and N = y \ogn +o(log«), where y is a constant, 0 < y < oo, the distributions of the normalized sums ^n/k converge to the gamma distribution with characteristic function A — it)~Y and density zY~le~zI F(y), z > 0. Actually, as stated in Theorem 4.2.1, these distributions become close locally. Proof of Theorem 4.2.1. By the inversion formula, the probability can be represented in the form 1 Cnn n = z} = -- e-itz^(t/n) dt, -nn and 1 /»oo ^-itz dt. i'(y) j^ii-ity Hence, 27tnP{SN/n =z}~ 2jte~z = h + h + h + h,
186 Random permutations where /, = f e-itz(tp^t/n)-(\-it)-y)dt, J-A h = f e~itz<p^(t/n)dt, J A<\t\<sn = f = - f e-itz(\-ityydt, JA<\t\ ' A<\t\<sn h Isn<\t\<nn h 'A<\t\ with the constants s and A to be chosen later. By Lemma 4.2.1, (fj?(t/n) -> A - it)~y for any fixed t. By Theorem 1.1.9, this means that the convergence is uniform with respect to t in any finite interval. Therefore I\ —> 0 for any fixed A as n —> oo. By Lemma 4.2.3, for sufficiently large n, \h\< 27tn{c/\ogn)N < 27tne~2N/y, and, for TV = y log n + o(log n), the right-hand side tends to zero as n —> oo. To estimate h and I4, we integrate by parts. For I4, this leads to /•oo -itz °° y r / e-ltz(\ - ityy dt = - e + y- \ Ja iz(\ -ity A z jA dt. IZ{\ -lt)Y A Z JA Therefore dt < 2y_ r z JA 2 JA A U -f 00 dt CA tr+i - Ay' where a, is a constant, and I\ can be made arbitrarily small by the choice of sufficiently large A. Similarly, '?-^^dt p-itz en 12 \n ) A izn jA
4.2 The number of cycles 187 By using the estimates of Lemmas 4.2.2, 4.2.3, and 4.2.4, we obtain 2 \h\ < - AY 'A n + N 2N rsn in JA \\ognj J JA A 00 dt v?-1 ,n dt where c, C2, and C3 are constants. If we choose sufficiently large A and n, we can make I/2I arbitrarily small. ¦ Now we can prove the following theorem on the behavior of the probability P{vn = N). Theorem 4.2.2. Ifn —> 00 and N = y logn + o(\ogn), where y is a constant, 0 < y < 00, then Proof. For* = 1 — \/n, the representation D.2.2) takes the form vn = N}= = n), D.2.7) N\(l- l/n)n where ^ = ^i + • • • + %n is the sum of independent identically distributed random variables with distribution D.2.3). By Theorem 4.2.1, 1 _, nT(y) By substituting this expression into D.2.7), we obtain the assertion of Theo- Theorem 4.2.2. ¦ The case where y = N/ log n —> 0 is described by the following theorem. Theorem 4.2.3. Ifn —> 00 and y = N/ logn —> 0, then ,-1 nV(Y) A+0A)). Proof. Taking into account that y < 1/2 beginning with some n, we choose the level n(l — y) and represent the probability P{?w = n} as follows: D.2.8) =n) = P{t;N = n, & <n(l-y), i = 1,...,«} + NP{rN = n,SN>n(l- y)}.
188 Random permutations Since ?i =m} = A n log n uniformly in m, n > m > n{\ - y), we see that > n{\ - y)} = 2_^ P{%N = m, Kn-\ = n — m) m>n(\—y) 6 -P{^-i <yn}(\+o(\)). D.2.9) We now prove ^ 1. D.2.10) Show that the random variable ^/(y«) converges in probability to zero. By the representation D.2.4) and the estimates D.2.5) and D.2.6), logn \ \n ynj log(y - f Q - log y | log« \y« log«/ ' and if y = N/ log n —> 0, then * + 0 f ,. \yn\ognJJ Thus, the characteristic function of ^/(yn) converges to the characteristic func- function of the random variable that assumes the value 0 with probability 1, and we obtain D.2.10). With some technical difficulties, it can be proved that under the conditions of the theorem, P{Sn=n, & <n{\-y), i = 1, ...,«} = o{\/{n log/i)). The assertion of the theorem follows from this relation and the relations D.2.8), D.2.9), and D.2.10). ¦ Theorem 4.2.4. Ifn ->¦ oo and y = N/ \ogn ->¦ 0, then N\n Proof. The assertion of the theorem follows immediately from Theorem 4.2.3 and representation D.2.7) if we take into account that the gamma function T{y) = l/y(l+o(l))asy -> 0. ¦
4.2 The number of cycles 189 Now consider the case where N/ \ogn —> oo. We distinguish four subcases: a = n/N —> oo, a —> c > 1, a —> 1 with m = n — N -> oo, and a —> 1 with m fixed. Let a —> oo. We must select the value of the parameter x so that E?>/ is close to n. Since for ?i with distribution D.1.2), x (l-x)log(l-x)' we choose x, 0 < x < 1, such that — =<*, D.2.11) where a = n/N. This equation is approximately satisfied if we take 1 x = 1 - a log a If TV/ log« —> oo, then x = 1 — l/(a log a) is farther from the singular point x = 1 than x = \ — l/n, and therefore the normal approximation is valid for the sum ?>/. Theorem 4.2.5. Ifn, N —> oo such that N/ log« —> oo, a = w/Af —> oo the parameter x = 1 — l/(a log a) and cra = ctyf log a, then = k} =^22 7a v 2jt N uniformly in the integers k such that z = (k — n)/(aa^/N) lies in any fixed finite interval. Proof. The characteristic function of the random variable ?i is l -xe log(l -x) It is easy to see that for any fixed t, as N/ log n —> oo and a = n/N —> oo, Denote by rfrn(t) the characteristic function of ^ = (^ — n)/(oa*/N), then under the conditions of the theorem for any fixed t, and the distribution of ^ converges weakly to the normal distribution with pa- parameters @, 1).
190 Random permutations The local convergence can be proved by the standard reasoning and we omit this technical part of the proof of Theorem 4.2.5. ¦ From Theorem 4.2.5 and representation D.2.2), we obtain the following asser- assertion. Theorem 4.2.6. Ifn, N -> oo such that N/ \ogn -> oo, a = n/N ->¦ oo, then where x = 1 — l/(a log a) and aa = as The following theorem for the case where a tends to a constant greater than 1 can be proved in the same way as Theorem 4.2.5. Theorem 4.2.7. If n, N —> oo and there exist constants «o and a.\ such that 1 < ao < a < ct\, the parameter x = xa, where xa is the unique solution of equation D.2.11) in the interval @, 1), and v I r\rr I 1 "V" I I y ^ ax = ~— = ' then = k} = ]==e-z2/2{\ + 0A)) V2N uniformly in the integers Jc such that z = (k — n)/ (ax \j2it N) lies in any fixed finite interval. Proof. The proof is similar to the proof of Theorem 4.2.5 and we omit the details. Note only that a — E?i and a2 = D?i for x = xa. ¦ Using Theorem 4.2.7 and representation D.2.2), we obtain the following asser- assertion on the distribution of vn. Theorem 4.2.8. If n, N —> oo and there exist constants ao and a\ such that 1 < ao < a < a\, then where xa is the unique solution of equation D.2.11) in the interval @, 1), and >! -xa) The asymptotic normality of ?# is preserved if a = n/N ->¦ 1 slowly, as specified below.
4.2 The number of cycles 191 Theorem 4.2.9. Ifn, N -> oo such that a = n/N ->¦ 1 am/ m = n — N -> oo, and the parameter x = xa, where xa is the unique solution of equation D.2.1 I) in the interval @, I), then =k} = J2/2 uniformly in the integers k such that z = (k — n)/-sfm lies in any fixed finite interval. The proof is similar to the proof of Theorem 4.2.5 and we omit the details. From Theorem 4.2.9 and representation D.2.2), we obtain the following asser- assertion on the behavior of P{vn = N}. Theorem 4.2.10. Ifn, N —> oo such that a = n/N —> 1 and m = n — N —> oo, then where xa is the unique solution of equation D.2.11). It is not difficult to see that if m2/N —> 0, then and consequently (- log(l - xa))N = x?(l + xa/2 + O(x2a))N = x»em(\ + O(m2/N)), Nm Therefore it follows from Theorem 4.2.9 that if n, N —> oo, a = n/N —> 1, m —> oo and m2/N —> 0, then Nm P{vn = N}= ,(l+ TV! 2mm\ Finally we consider the case where m is bounded. Theorem 4.2.11. If N ->¦ oo and the parameter x = \/N, then for any fixed k = 0,1,..., Proof. By expanding the characteristic function <p(t) of the random variable with parameter x = 1/N, we obtain for any fixed t,
192 Random permutations If x = \/N and TV —> oo, then the characteristic function of ^ — N is equal to A _|_ (eit _ \)/BN) + O(N~2))N and tends to e(e"~{)/2. This means that the distribution of %n — N converges to the Poisson distribution with parameter 1/2. From this theorem and representation D.2.2), we obtain the following assertion, which completes the description of the asymptotic behavior of the distribution of vn. Theorem 4.2.12. Ifn -> oo, n/N -> 1, and m = n — N is fixed, then P{Vn=N}= JT N It is not difficult to see that Theorems 4.2.2, 4.2.4, 4.2.6, 4.2.8, 4.2.10, and 4.2.12 give a complete description of the asymptotic behavior of the distribution of the number of cycles in a random permutation of degree n as n -> oo. 4.3. Permutations with restrictions on cycle lengths In this section, we present some results on permutations with restrictions on their cycle lengths. Let R be a subset of the set of natural numbers. We consider the set Sn<R of all permutations of degree n with cycle lengths from the set R. One of the first questions that arises in this situation concerns the asymptotic behavior of the number an7r of elements in Sn,r. This problem is far from being completely solved. Here we describe some of the solutions provided by an approach based on the generalized scheme of allocation. Let the uniform distribution be defined on Sn 7r and let vn7r be the total number of cycles in a random permutation from this set. Put bn,R = (n — 1)! if n e R, and bn,R =0 otherwise. It is easy to see that PK, = N) = -2l_ T ^'¦'I"V. D.3.1) n\-\ \-n^=n We introduce independent identically distributed random variables r with distribution where X^ bkRxk \r^xk A = Li' x>0- k=l K- keR
4.3 Permutations with restrictions on cycle lengths 193 By using these random variables, we can rewrite D.3.1) in the form D-3.3) Hence, summing over TV, we obtain D.3.4) N=l It is clear that above we have repeated the general approach of Section 1.3 for the case of the set Sn,R, and relations D.3.1), D.3.3), and D.3.4) are the realizations of the general relations A.3.1), A.3.10), and A.3.11), respectively. To find the asymptotics of the numbers an^r, it is sufficient to choose an appro- appropriate value of the parameter x, substitute it into the expression of the distribution D.3.2), and then prove a local limit theorem for the sum of independent random variables with this distribution. We succeed in obtaining results on an^ only if the structure of R has some regularity. In the general case, the asymptotics of #„,# is unknown. To demonstrate the approach, we consider first a simple case where R is the set E of even numbers. Theorem 4.3.1. Ifn^-oo, then an E = 2 (-Y (\ + o(\)) D.3.5) for even n, and anyE = 0 for odd n. Proof. To prove the theorem, we use the representation D.3.4). We consider the random variables ?j ,..., ?Jy with distribution D.3.2), where R = E = {2, 4,...}, and X2k j BR(x) = BE{x) = ]?_ = -_ log A - x2). keR The random variables ?,- = ?• /2, / = 1,..., N, are independent identically distributed, and X2k =-, ?=1,2,.... D.3.6) -x1) If we choose x = j\ — \/n, then this distribution coincides with distribution D.2.3) from the previous section, and according to Theorem 4.2.1, if n -> oo, N = y logn + o(logn), where y is a constant, 0 < y < oo, then nT{y)
194 Random permutations uniformly in the integers k such that z = k/n lies in any fixed interval of the form 0 < zq < z < z\, where z$ and z\ are constants. Since we obtain that if n —> oo, TV = (log«)/2 + o(logn), and n is even, then a/2 + • • • + Hn = n/2} = -^-~e~[l2{\ + o{\)). D.3.7) For odd n, this probability equals zero. To obtain an,R with the help of relation D.3.4), we have to sum the probabilities P{?jV n) w^m ^e Poisson coefficients. To this end, we need to estimate these probabilities for all N. We show that for all N, n logn This bound is a consequence of the following chain of estimates. It follows from D.3.2) that _ y^ 1 K(n,N) l " ' N where K(n, N) = {k\, ... ,kx: h -\ \-kfj = n, k\, ..., k]^ e R}. Hence, P< X" xkl < N-l N I x-^ jc* \ N nBE(x) We obtain relation D.3.8) because B = BE{x) = (log/i)/2. We split the sum
4.3 Permutations with restrictions on cycle lengths 195 into four summands, dividing the domain of summation into four parts: A\ = {N: \ <N < B - B3/4}, A2 = {N: B - B3/4 < N < B + B3/4}, A3 = {N: B + B3/4 < N < B + B2}, A4 = {N: B + B2 < N <n/2}. It is not difficult to see that relation D.3.7) is satisfied uniformly in N € A2. Therefore NeA2 2 1/?1 v^ BNe~B V2 TI/2 T (l+(l)) ^ n n f—' N\ NeA2 since B = (logn)/2, and as N -> 00, y AM NeA2 The remaining part of the sum is o(\/n). Indeed, by applying estimate D.3.8), we obtain IB ^ BNe-B 1 ^ BNe~B Si < 2^ —j— = ~ and Si = o{\/n) because E^-< NeA\ as n —> co. It follows from D.3.8) that BNe~B If we use the normal approximation for the Poisson distribution, we find that dN^-B poo poo <c,f it, N i where ci and c2 are constants. Hence, ^3 = o{\/n).
196 Random permutations Similarly, by using D.3.8), we obtain -)V'. Hence, S4 = o{\/n) because e/B < e~l for n sufficiently large. If we combine the estimates of S\, S2, S3, and 54, we obtain Substituting this expression into D.3.4) and expanding n! by the Stirling formula give the assertion of the theorem. ¦ The analogous result is valid for the number of permutations for which R is the set of odd numbers. We turn now to the case where the set R is not as regular as E. Let R(k) be the number of elements of R that are not greater than k. Set i?@) = 0. In the sequel, we assume that lim R{k)/k = p, 0 < p < 1. In this case, p is called the density of R in the set of natural numbers. We will find the asymptotics of an^ under the following additional conditions on the set R. A) There exists a positive integer r such that, for any nonnegative integer s, the set R n {5 + 1,..., 5 + r] cannot be embedded in any integer lattice with a step not equal to 1. B) The generating function F(z) of the set R has a finite number m of poles at the points z/ = e2jtil/m, I = 0, 1,..., m - 1, on the unit circle \z\ = 1; in other words, it is of the form keR where P{z) is a polynomial. Note that, since the coefficients of the series F{z) take a finite number of values, by Szego's theorem (see, for example, [19]), there are only two possibilities for F(z): Either F{z) has the form D.3.9), or the set of singular points of F{z) is dense everywhere on the unit circle, and therefore F(z) cannot be extended outside the unit circle. We consider here only the first case. In this case, the coefficients of F(z), with exception of some initial numbers, form a periodic sequence with
4.3 Permutations with restrictions on cycle lengths 197 period m, and, therefore, the set R has density p = l/m, where / is the number of units in the period. Consider independent identically distributed random variables ?1, ...,?# with distribution , keR, D.3.10) kB(x) where _ 1 n' Theorem 4.3.2. Suppose that R has the density p > 0 and satisfies conditions A) and B), n —> oo, N = p logn + o(logn). Then uniformly in the integers k such that y = k/n lies in any fixed interval of the form Q < yo < y < yi < oo. With the aid of Theorem 4.3.2 and relation D.3.9), we prove the following assertions. Theorem 4.3.3. Suppose that R has the density p > 0 and satisfies conditions A) and B). Then, as n —> oo, a»,R = {n- l)!A"/r(p)(l + 0A)), D.3.11) where Since X!J?iO ~ \/n)k/k = logn, the assertion D.3.11) can be written in the form where Theorem 4.3.4. Suppose that R has the density p > 0 and satisfies conditions A) and B). Then, as n —> oo, v«,^ = N} = -== exp —— A + o jlB \ 2B J
198 Random permutations uniformly in the integers N such that {N — B,^r)/' yjBnj+ lies in any fixed finite interval. To prove Theorem 4.3.2, we establish some auxiliary results. The characteristic function of distribution D.3.10) is _ y^ xkeitk B(xeil) ~ h hikB(x) B(x) Lemma 4.3.1. If R has the density p > 0, then, as n —> oo, <p I - = 1 1- o \nj logn \logn/ for any fixed t. Proof. We first derive some auxiliary estimates. It is easy to see that oo keR k=\ oo oo k=\ k=\ oo k=\ Set s = log n. For such e, xkR(k) < J^ k<\og2n, l<k<e l<k<e and, since R has positive density, k>e k>e k>e Thus, as n —> oo, oo i 2 ten i=i V " D.3.12)
4.3 Permutations with restrictions on cycle lengths 199 Similarly we obtain the estimate k B(X) = J2— =p\ogn + o(n). D.3.13) keR We now write the characteristic function in the form It is easy to see that B(xeit/n) - B(x) oo = E -rxk(eitk/n ~ l)(R(k) - R(k - D) k=\ = f; -xkR{k) (eitk'n - 1 - JtLfcW+W - I) k=\k 00 1 / 1 = Y -xkR{k) (eitk'n(\ - ei[/n) + I(e j—* k \ ' n ft== 1 First of all, we estimate the part that does not contribute essentially to the sum. If t is fixed and n —> oo, then 00 k x x R(k) {i We transform the other parts of the sum as follows: OO k=\ k kRik) , y> xR(k)citk/n L , ^ tk \ kn *—' k \ n k=\ oo D.3.14) ( K— 1
200 Random permutations and 00 k=\ 00 oo k=l oo kn k—\ kn Similarly, D.3.15) 00 Set s = logn and E = n logn. Then k<e kn logn k<e n k<e kn --I) f k<e n In exactly the same way, E kn ,itk/n _ j\ <-2Tkxk < n2tE -i) V k < - > jc* < k>E It is clear that R{k)/k =
4.3 Permutations with restrictions on cycle lengths 201 uniformly in k, s < k < E. Hence, itk/n = kn e<k<E e<k<E = p T e<k<E Similarly, _ j) = kn , r e<k<E e<k<E -l) = p ^ kn e<k<E " e<k<E The sums in the right-hand sides of these relations are integral sums of integrable functions. Therefore, as n —> oo, their limits exist and equal 'OO 1 I 1 .' * \ -r J- ./o 1 - It "z-')^=r477-1- f°° -e~z(eitz -\)dz= - log(l - it), JO 2 V J respectively. Thus, as n —> oo, for any fixed ^, B(xeit/n) - B{x) = -p log(l - it) + o(l), and hence, logn Lemma 4.3.1 implies that for any fixed t, as n —> oo and N = p log n + o(log n), and for the normalized sum (^i + • • • + %n)/k the limit distribution is the distribu- distribution with the characteristic function (l — it)~p that has the density y p~l e~y/ T(p). To prove the local convergence of the distributions, we have to estimate <p(t/n) outside a neighborhood of zero.
202 Random permutations Lemma 4.3.2. Suppose that R has the density p > 0 and satisfies conditions A) and B). Then, for any s > 0, there exists q < 1 such that for s <\t\<n, \<P(O\ <q- Proof. Let k\, fa, and fa be integers and ak{, akl, ak? > 0. It is easy to verify that akle itkl = (ak{ +akl+ahJ - 2ak] akl{\ — cos t (fa — k\)) - 2ak[ ak3(l - cos / (fa - fa)) - 2aklak?>(\ - For a > 0 and <5 > 0, Therefore \- akl + ak3 - aklak2(l - cost(k2 - <7*> + ak2eitk* + ak3e -COSf (fa -fa)) akl -\- ak2 -\- aklak3{\ - cos t(k3 -fa)) D.3.17) Suppose now that, as in condition A), the integers fa, fa, and fa do not lie on any lattice with a step greater than 1 and are contained in an interval of length r. Then, for s < \t\ < n, the three cosines from the right-hand side of D.3.17) do not simultaneously take the value 1. Moreover, since fa, fa, and fa are contained in an interval of length r, their differences can take only a finite number of values. Therefore, there exists a > 0 such that for s < \t\ < n, > 3a D.3.18) uniformly in all such fa, fa, and fa. We now let ak = xk/k, k = 1,2,..., and suppose condition A) holds for fa > fa > fa. It follows from D.3.17) and D.3.18) that ak] + akl + ak3 - akle ak3e itk3 > >aarakl D.3.19)
4.3 Permutations with restrictions on cycle lengths 203 Write the characteristic function <p(t) in the form oo rl+r 52 -^(R(k)-R(k-\))eltk 1=0 k=rl+\ B(x) From every set {rl + 1, • • •, rl + r), select, according to condition A), three inte- integers k\i, k2i, and ky from R that do not lie on any integer lattice with a step not equal to 1. Using estimate D.3.19) gives akueltku + ak2lelthj + akveltk3l > aarakli > aarari+r. Therefore, taking into account that R(kn) — R(kn — 1) = 1 for /' = 1, 2, 3 and / = 0, 1,... yields oo rl+r B(x)\<p(t)\ < 1=0 k=rl+\ oo 1=0 oo 1=0 +ak2i +akv) akuettk" + ak2leitk* 00 D.3.20) Inequalities D.3.20) imply the assertion of Lemma 4.3.2 because r is fixed, x = 1 — \/n, and, as n —> 00, 00 B{x) = p\ogn + o{\ogn), ) — = -log(l -xr) = \ogn + o Lemma 4.3.3. Suppose that R has the density p > 0 and satisfies conditions A) and fB). Then there exist c\ and e > 0 such that for every 1 = 0, \,... ,m — 1 and \t/n — 2nl/m \ < e,for sufficiently large n, 1 n c\ - 2nln/mJ logn Proof. We start by estimating n
204 Random permutations By condition B), there exist c, <5 > 0 such that for \z\ < 1, |z/ — z\ < 8, I = 0, 1, .. .,m - 1, keR 1 = 0, l,...,m - 1. D.3.21) Set z = xelt/n, where x = 1 — 1/n. It is clear that \z\ < 1 and there exists s > 0 such that if |f/n — 2jtl/m\ < e, then |z/ — z\ < 8 for sufficiently large n. Therefore, D.3.21) implies that for \t/n - 2nl/m\ < s, I = 0, 1,..., m - 1, keR cn B(x)\\ - cjn B{x)J\ + (t- 2nln/mJ Since B(x) = plogn + o(logn), there exists c\ such that for every / = 0, 1,. m — 1, if \t/n — 2nl/m\ < s, then c\n logn^l + (t -2nln/mJ for sufficiently large n. We now proceed to estimate the characteristic function cp(t) in the intermediate range of t. Obtaining the estimate involves some technical difficulties. So, for the sake of greater clarity, we first treat the case R = N. In this case, Pk = =k} = kB(xY ?=1,2,..., B(x) = - Consider the random variable f m > 0, = ?i — ?2- Its distribution is symmetric, and for Pm = oo = y^jPkPk+m- k=l Let 00 cos^m. It is clear that the characteristic function <p(t) of the random variable ?1 is related to ip(t) by the equality <p(t) = \cp(t)\2. To estimate ip(t), we use a standard inequality
4.3 Permutations with restrictions on cycle lengths 205 (see, e.g., [49]): For; > 0, oo oo 1 — (p{t) = 2 / Pm\\ ~~ costm) >2/ y pm, D.J.ZZ) m = \ s=0 meM.i where In 2ns In 2ns 1 m: 1 < m < 1 \ . 2t t ~ ~ 2t t \ Lemma 4.3.4. For mo > 0, oo 2 2_^ pm > /, PI /,Pk- m>m0 l>2m0 k=l Proof. By using / = m + k as the variable of summation, we obtain oo I—mo oo oo m>mo k=\ l>mo+\ k=\ 1=1 k=l+mo D.3.23) The right-hand side of D.3.23) is estimated from below by the quantity oo l>2m0 k=l l>2m0 To see this, it is sufficient to delete the first terms in the first sum from D.3.23), retaining l>2m0 k=l and, in the second sum from D.3.23), to shift the domain of summation to 2mo, giving oo l>2mo k=l-mo+l which does not exceed the second sum from D.3.23) by the monotonicity of the probabilities. ¦ Lemma 4.3.5. For 0 < t < n, 1 v^ t 3 2^ Pk- k>it/t
206 Random permutations Proof. Note that the summation on the right-hand side of D.3.22) occurs over integers m from an interval of length n/t. If we enumerate intervals of such a length on the positive semi-axis starting at the point Jt/Bt), the domain of summation will consist of the intervals labeled by odd numbers. Notice that the sequence of probabilities pk, k = 1, 2, ..., is monotone, and the numbers of integer points in any two intervals of length n/t differ by at most 1. Therefore each interval of length n/t for 0 < t < tt contained in the right-hand side of the sum D.3.22) contributes not less than one-third of the total sum of the two following intervals: the interval itself and the interval adjoined to it on the right side, which does not belong to the initial domain of summation. (Note that, as t -> oo, the number of integer points in one interval increases and its contribution to the sum tends to 1/2.) Therefore, D.3.22) implies oo Pm > ^ s=0meMs m>n/Bt) By applying Lemma 4.3.4, we obtain the assertion of Lemma 4.3.5. ¦ It remains to estimate the sum of the form Ylk>a Pk from below. If we use the inequality we obtain T- k>a co -y D.3.24) where c-x, is a constant. We use Lemma 4.3.5, set a = nn/\t\ in D.3.24), and obtain for \t\/n <n, - > pi > C3 - log nn where a, is a constant. Hence, we go on to estimate <p(t/ri) and find that ft" V n 3logn < ( 6logn If N > \ logn, then N 1 <exp|-—log|f| + —— [ 12 12 log } < cskr1/12. D.3.25)
4.3 Permutations with restrictions on cycle lengths 207 We now return to the case R C N. We retain the notation cp{t) and (p{t) for the characteristic functions and set _... ,, akxk y-^ . 1 Pk = P{?i = k) = , keR, B(x) = > airX , ak = -; B(x) ^—' k v keR 8R(k) = 0 for k g R, and<5^(/:) = 1 for k e R. Lemma 4.3.6. Suppose that R has the density p > 0 and satisfies conditions A) and B). Then, for \t\/n <n and N > ^p\ogn, where r is defined in condition A) and c$ is a constant. Proof. We revise the arguments leading to estimate D.3.25). Inequality D.3.22) now takes the following form: For t > 0, ~ oo oo where 2ns In 2ns 1 \ ITT 2ns In 2^ t ~ ~ 2t t \ We retain only one summand in each interval of length r, replace this summand by the minimum value over the interval, and use the transition from the sum over one interval of length r to one-third of the sum over the interval of twice the length. Then we obtain for t > 0, oo J2T,ak+mXk+mSR(k + m)>- ? ak+rlxk+rl. s=0meMs rl>7t/Bt) Once again, we preserve only one summand in each interval of length r and get 2 °° ak+nxk+rl k=l l>n/Btr) oo .rm+rl urm+rlJ l>n/Btr) E oo x™ 3B2(x)r2 *-< ^ m m+l' l>it/Btr) m — \ The assertion of Lemma 4.3.4 is based on the monotonicity of the probabili- probabilities pk, k — 1,2, The summands of the last double sum are similar to the
208 Random permutations summands of the sum in Lemma 4.3.4, and the values xrk/k, k = 1,2 are also monotonic. Therefore we may use Lemma 4.3.4 and obtain c™ log(l -jcO / l>7t/Btr) m=\ m ,2 Z_^ / l>7t/(tr) For a fixed r, the estimate D.3.24) remains true. Therefore, by taking into account the asymptotics B(x) = p\ogn +o(\ogn) and - log(l — xr) = \ogn + o(log«), we find 2 i i \ 1 - t n 1 3r2p2\ogn Hence, '?. \og\t\ and for N > ^p log n, -l/A2r2p) where C6 is a constant. Proof of Theorem 4.3.2. Consider the sum ?w = ?i + • • • + ?w of independent identically distributed random variables with distribution D.3.10). As we have seen, Lemma 4.3.1 implies that, as n ->¦ oo and iV = p\ogn + o(logn), the distribution of ?#/n converges weakly to the distribution with density u>0. We now prove the local convergence of these distributions. For an integer k, let y = k/n. By the inversion formula, n where ^@ is the characteristic function of the distribution D.3.10). The density of the limit distribution at a point u > 0 can be represented by the integral 00 1 ritu du. Hence,
4.3 Permutations with restrictions on cycle lengths 209 where h h h - - = f 1 n -ity. - ay >N ('-) dt, (i -ay dt. dt, A<\t\<nn and the constant A in the integrals is to be chosen later. By D.3.16), for any fixed A, the integral I\ tends to zero as n —> oo and N = p log n + o(log n). To estimate the integrals I2 and I3, we integrate by parts. For I2, this yields '00 p-ity (i - dt = -¦ iy{\ - ity 00 -yh ,-ity dt. Hence, \h\< dt and I/21 can be made arbitrarily small by the choice of A. Similarly, ryrn JA t\ . e~"y N t - ) dt = <pN - n) iy V«, where _ n rn iy Ja t\ 1 ,(C -)-<p I - N N Therefore 2 When we use the estimates of Lemmas 4.3.2 and 4.3.6, we obtain \I\- q< 1; N \<p(A/n)\N < Hence these summands can be made arbitrarily small. It remains to estimate the integral /. Choose e such that Lemma 4.3.3 is valid, and represent / as the sum of three integrals: I2(s)
210 Random permutations where TV f ly Ja A<\t\<en ''-Y-V'('-)"'¦ n I n \n ) hie) is the integral over the sum of ^-neighborhoods of the poles of F(z), that is, over the sum that equals m-\ u 1=1 2nln 2nln~\ —en -\ , en -\ , m m J and 73 (e) is the integral over the remaining set . r \ i i f 2nln 2nln Ae = {—nn, —en] U [en, nn]\ I I —en -\ , en -\ ML m m By using Lemmas 4.3.3 and 4.3.6, we find 2Nc\c6 c_6 r n Ja 1 A A+-'2I/2 and for y > yo > 0, the value |/i(e)| can be made arbitrarily small by the choice of a sufficiently large A. By using Lemma 4.3.3, we find i pen+27tln/m If n J- —en+2nln/m A1- U dt < — f logn J_ en+lxln/m dt - 2nln/mJ logn ren dt J-en Vl +t2' and there exists a constant c-j such that for a fixed e, dt '-en Vl + t2 p J - < c-j logn. Therefore, we use the estimate of Lemma 4.3.2 and find that for y > yo > 0, Nl en dt \h(e)\ < ylogn and under the conditions of Theorem 4.3.2, the right-hand side tends to zero. For t e Ae, \<p\t/n)\ < where eg is a constant that is the upper bound of \F{z)\ for \z\ = x not in the
4.3 Permutations with restrictions on cycle lengths 211 neighborhoods of the poles. By using this estimate and the estimate of Lemma 4.3.2, we find < N y Ja, <p N-\ N-\ ynB{xV n L 1 n dt dt Under the conditions of the theorem, the last term of this chain of inequalities tends to zero for y > yo > 0. ¦ It is easy to see that by first choosing a sufficiently large A and then a sufficiently large n, we can make the difference being estimated arbitrarily small. Note that the difference is bounded uniformly with respect to N, and hence, there exists a constant eg such that for y > yo > 0 and for all N, = k}< c9/n. D.3.26) Proof of Theorem 4.3.3. In D.3.8), divide the domain of summation into two parts: N{ = {N: \N - B(x)\ < N2/3} and N2 = {N: \N - B(x)\ > N2/3}. It is not difficult to see that the assertion of Theorem 4.3.2 is fulfilled uniformly in N € N\. Therefore uniformly in N e N\, so ,-1 nT{p) We use the estimate D.3.26) and obtain NeN2 N\ n NeN2 N\ Since the sum on the right-hand side of this inequality tends to zero, the total sum in D.3.4) equals (enT{p)yl{\ + o(l)). It remains to note that \ () x" = e~\\ + o(l)), B(x) = Bn<R = keR n
212 Random permutations Proof of Theorem 4.3.4. According to D.3.3), n\(B(x))N P{vn,R = N}= I )} N\xnan,R If we substitute the corresponding expressions for an r and P{Ov = n}, we obtain for N = B(x) + o{B{x)). We note that B{x) = Bn<R and that the expression obtained above holds uniformly in N such that (N — B(x))/*JB(x) lies in any fixed finite interval; thus, we obtain the assertion of Theorem 4.3.4. ¦ 4.4. Notes and references The probabilistic approach that is now commonly used in combinatorics was first formulated in an explicit form and applied in the investigations of the symmetric group Sn by V. L. Goncharov [51, 52, 53]. For the random variables ct\,... ,an, he found the joint distribution D.1.4) and the generating function D.1.5). For the total number of cycles vn = a\ + ¦ • • + an, he proved that, as n —>• oo, Evn = \ogn + y + fl - (n2/2 - y/2)/ Goncharov also proved that the distribution of {vn — \ogn)/^/\ogn converges to the standard normal distribution, and the distribution of ar converges to the Poisson distribution with parameter 1/r. Let f3Vn be the length of the maximum cycle in a random permutation from Sn. Goncharov [51, 53] showed that h=0 where S0(m,n) = l, Sh(m,n) = Let I0(x,\-x) = l, Ih(x,l-x)= I l" h dx\ X\-\ X\,...,Xh>X
4.4 Notes and references 213 Goncharov proved that, as n ->• oo, the random variable fiVn /n has the distribution with the density ^' / 1 \h 1 i which, as is clear from the preceding formula, is defined by different analytic expressions on the sequential intervals of the form [1/A + k), I/A], where k is an integer. For example, x 2 1 1 3 2 Although Goncharov investigated the cycle structure of random permutations in great detail, these problems continue to be of significant interest to mathemati- mathematicians. V. F. Kolchin [71] proposed an approach based on the generalized scheme of allocation. The results on the asymptotic properties of random permutations obtained with the help of this approach are presented in [78]. Note that, among the others, the asymptotic logarithmic normality of the middle terms of the series of order statistics composed of the lengths of cycles, and the local limit theorem on the convergence of the distribution of the total number of cycles vn to the nor- normal distribution were first proved by this method. It is clear that this approach makes it possible to investigate the asymptotic behavior of the local probabilities P{vn = N} for all possible values of N = N(n) as n ->• oo. These investiga- investigations were carried out in [109, 115, 117, 146, 147, 148]. In Section 4.2, the results of these investigations are presented. Theorems 4.2.1 and 4.2.2 were proved by Yu. Pavlov in [115,117]; and Theorems 4.2.5,4.2.6,4.2.9,4.2.10, and 4.2.12 were proved by L. M. Volynets in [146, 147, 148]. Methods of estimating the rate of convergence in limit theorems for sums of independent random variables are well developed in the theory of probability. Therefore the approach that reduces the study of characteristics of random per- permutations to problems concerning the sums of independent summands provides an obvious way to obtain the limit theorems containing estimates of the rate of convergence. The estimates under the conditions of Theorem 4.2.1 were obtained by Yu. Pavlov [117] and for y = 1 by A. Pavlov [109]. The following result of Volynets [146] provides a better bound than the one given in [109]. Theorem 4.4.1. Ifn —>• oo, N = \ogn + x^logn, x/^logn —>• 0, then
214 Random permutations Volynets [146] proved this theorem by using the approach based on the gener- generalized scheme of allocation. Let Hn be the set of all single-valued mappings of the set {1,...,«} into itself. In particular, Sn c ?«. The random mappings from T,n were first studied by J. B. Kruskal [94] and B. Harris [57], and many studies have considered subsets of T,n, which are distinguished from En by various constraints on the mappings. We mention only the articles by V. N. Sachkov [128, 129, 130], in which the mappings have the height of less than a fixed number, and cycle lengths are from a fixed set; the articles by A. A. Grusho [54, 55], which treat the subset T,nr that consists of the mappings from ?„ whose vertex degrees are not greater than r; the articles by Yu. Pavlov [114, 115] considering the characteristics of the mappings with exactly m components (the case m = 1 is considered by G. N. Bagaev in [8, 9]); and the article by J. Arney and E. A. Bender [5], which treats mappings with constraints on degrees of the vertices. The research in these directions began in the early seventies and is still ongoing. In our opinion, the most surprising results concerning mappings with constraints were obtained by I. B. Kalugin [64], which we summarize. Let En ,r be the subset of mappings from T,n such that the degrees of the vertices take values only from a set R that contains zero and does not coincide with the set {0, 1}. Let ?(A.) be a random variable with the distribution where A. is a positive constant and There exists ccr such that E^(a^) = 1. Denote by Br the variance of the random variable ^{ocr). For the number of cyclic vertices A.^ and the height tntR of the random mapping from T,n,R, the following assertions are well known [64, 78]. Theorem 4.4.2. Ifn ->• oo, then JnjB~RP{\f = k} = ze-z2'2(\ + 0A)) uniformly in the integers k such that z = k^/BR/n lies in any interval of the form 0 < z$ < z < z\ < oo. Theorem 4.4.3. Ifn ->• oo, then for any fixed x > 0, 00 [nxn,R < x] -+ k=—oo
4.4 Notes and references 215 An unexpected result appears if we consider the set S* R of mappings from SM defined as follows. If in the graph of a mapping from En we delete the edges that connect the cyclic vertices, we obtain a graph consisting of trees. The set E* R contains the mappings from !!„ such that the degree of any vertex of the trees takes a value in R. Thus the difference in the restrictions on the degrees in E* R and !!„,# seems to be insignificant because only the restrictions on the degrees of cyclic vertices differ by 1. But the sets T,n,# and E* R have a substantial difference in the structure of their corresponding random graphs. Let AR and t* r, respectively, be the number of cyclic vertices and the height of a random mapping from the set E * R with uniform distribution. For the random variable ?(A.), set If R does not coincide with the set of all nonnegative integers, then cir < 1. Theorem 4.4.4. Ifn^- oo, then P{ A?> = *} = uniformly in the integers k such that z = (k — A — aR)n)/ {b R^/n) lies in any fixed finite interval. Theorem 4.4.5. If n —»• oo a?u/ ? = ?(«) is such that naR —>• yS, where ft is a constant, then for any fixed integer m, <* <t+m} = where the constant kp depends only on ft and the set R. Since t = t(n) is of order \ogn, the random mappings from E* R have many cyclic vertices and, as a consequence, have the height of order log n rather than ¦s/n as in the case for the mappings from ?„,#. A satisfactory explanation for this situation is not known. In Section 4.3, we considered the set Sn^r of all permutations of degree n with cycle lengths from a fixed set R. The interest in such sets may be partly explained by their connection with the equations involving permutations, which we will look at in the next section. Another reason for investigating the set Sn ^r and similar sets of mappings with various restrictions is the possibility (see [5]) of approximating more complicated sets of combinatorial objects by such sets with relatively simple constraints. Partly for these reasons, the asymptotic behavior of the number an,R of elements in Sn^r has been considered in some recent studies [25, 80, 102, 149, 153, 154].
216 Random permutations The generating function /(z) for the numbers an,# of elements in Sn,R is 00 Therefore it is convenient to apply the saddle-point method to obtain the asymp- totics of an,R. By this method, the cases in which the elements of R form an arbitrary arithmetic progression are considered in [25, 107]; see also [130]. The application of the Tauberian-type theorems is another approach that has been used in the investigations of this problem [153, 154, 155]. Let R(n) be the number of elements of R that are not greater than n and let \A\ be the number of elements in A. Theorem 4.4.6. Letn^oo, R{n)/n -> p, 0 < p < 1, D.4.1) and form > n, m = O(n), -\k:k<n, ke R, m-ke R\ ^ p2. D.4.2) n Then an,R = (n- l)\exp{ln,R - yp}/ r(p)(l + o(l)), D.4.3) where ln,R = /J ~, r reR,r<n y is the Euler constant, and T is the Euler gamma function. Conditions D.4.1) and D.4.2) indicate that the set R is similar to a typical realization of a random set containing each positive integer with probability p independent of the other integers. As examples of the sets R that satisfy conditions D.4.1) and D.4.2), we may take sets of the form R = {k:{g(k)}eA}, D.4.4) where g(t) is a real-valued function of t > 0, {x} is the fractional part of x, and A is an interval or a finite union of intervals from [0, 1] with the Lebesgue mea- measure p. A. L. Yakymiv [154, 155] proved that a set R of the form D.4.4) satisfies conditions D.4.1) and D.4.2) if g{t) = tal{t),
4.4 Notes and references 217 where a is a noninteger positive number, l{t) is a slowly varying function, and as t —> oo, Let a,-R be the number of cycles of length r in a random permutation of Sn,R and let vn_r = a\,R + ¦ • ¦ + an_R be its total number of cycles. Yakymiv [154, 155] proved the following assertions. Theorem 4.4.7. Suppose that conditions D.4.1) and D.4.2) are satisfied and n —> oo. Then the distribution of the random variable (vnyR — ln,R)/y/plogn converges weakly to the standard normal distribution, and for any fixed r e R, the distribution ofa^R converges to the Poisson distribution with parameter 1/r. A case of irregular behavior of an,R is considered in [149]. Theorem 4.4.8. Ifn —>• oo and R = E U M, where E is the set of all even posi- positive numbers and M is a set of odd numbers such that the series 1 meM converges, then e for even n, and ,-b e, , -Of!+*(!)) for odd n. Volynets [149] proved this theorem with the aid of relation D.3.4), in which she uses the representation =n-s}. s,m Here the variables ?[ ,..., %? have the parameter x equal to -JT^TJn, v is the number of these variables taking values in M, rj is the sum of these variables, and ?j ,..., ?J^ are independent identically distributed random variables with the distribution ^ l^^l keE. Note that if b ->• 0, the result of Theorem 4.4.8 transfers continuously to D.3.5).
218 Random permutations Theorems 4.3.2, 4.3.3, and 4.3.4 are given in [80]. It can be easily shown that the asymptotics D.3.11) and D.4.3) are identical. Thus, quite different sets of conditions yield coinciding results. This coincidence shows that there exist weaker conditions sufficient for the validity of the asymptotics D.3.11). We give the detailed and cumbersome proof of Theorem 4.3.3 because we conjecture that condition A) from this theorem and the existence of a positive density of R are sufficient for the validity of D.3.11) and that it may be possible to simplify the proof. The research on the sets Sn,R of permutations with restrictions on the cycle lengths provides an example of a fruitful competition of various analytical meth- methods of asymptotic analysis such as the saddle-point method, the application of Tauberian-type theorems, and the approach based on the generalized scheme of allocation. Note that it would also be interesting to consider the cases where the density p = 0.
Equations containing an unknown permutation 5.1. A quadratic equation If g and / are permutations of degree n, then the result of their sequential action h = fg is a permutation of degree n called the product of g and /. The set Sn of all permutations of degree n with this operation is the well-known symmetric group of degree n. Therefore we can consider equations of the form Xd = a, E.1.1) where d is a positive integer, a e Sn, and X is an unknown permutation from Sn. In the previous chapter, we considered the set Sn ^ of all permutations of degree n with cycle lengths from a fixed set R and found the asymptotics for the number of elements in Sn>R for some regular sets R. The interest in the sets of permutations Sn,R may be partly explained by their connection with some equations involving permutations. For example, the set of all solutions of the equation Xp = e E.1.2) in the symmetric group Sn, where e is the identity permutation and p is a prime number, is exactly the set Sn,R with R = {1, p}. Indeed, a permutation X satisfies equation E.1.2) if and only if its cycles are of the length 1 or p. Denote by Tn the number of solutions of equation E.1.2). Theorem 5.1.1. If p is a prime number, then TM - f {n-pk)\k\pk' 0<k<n/p y y Proof. Let a be a random permutation from S». It is clear that 219
220 Equations containing an unknown permutation and the study of Tn 's equivalent to the study of the probability P{ap = e). Since Tnip) = an,R, where R = {1, p), \ap = e\ = {ar = 0, r / 1, r / p) = [a\ + pap =n), where ar is the number of cycles of length r in a random permutation from Sn. By D.1.4), P{ai =n-pk, ap = k, ar=0,r^\, r # p) = Summing these probabilities over admissible values of k yields the assertion of the theorem. ¦ Set ao^R = 1 and consider the generating function of the sequence an,R, 00 „ k=0 n\ Theorem 5.1.2. fR{z) = exp E- Proof. According to D.1.5), 00 n=0 oo untn n where E.1.3) <Pn(ti,...,tn)= ? P{ct\=m\, an =mn}t™{ ...t™n, m\,...,mn and ar is the number of cycles of length r in a random permutation from Sn. If we put tr = 1 for r e R and tr = 0 for r $ R, we find that the corresponding generating function <pn(t\, ...,tn) is >{ar =mr, r e R, ar =0, r ? R], where =n Mr Mr = | mi, mn: reR
5.1 A quadratic equation 221 It is easy to see that P{ar =mr, r e R, ar = 0, r ? R) = P I ?/ar =n Mr IreR Thus, substituting tr = 1 if r e R and tr = 0 if r (? R into E.1.3) shows that the generating function for rccr = n I reR an,R equals 00 n=0 n\ = exp E- E.1.4) reR In view of Theorem 5.1.2, it is convenient to apply the saddle-point method to obtain asymptotics of TnP . In the next section, we will use a different approach based on the generalized scheme of allocation; however, for comparison, we now B) present the derivation of the asymptotics of !„ by applying the saddle-point method. Theorem 5.1.3. As n ->• oo, Proof. Since oo TB) » by Cauchy's formula F(n) = /2 integrating over an arbitrary contour that goes around the point z = 0. We can write and choose the contour of integration to be the circle passing through the saddle point q, where the derivative of the function z2 /(z) = z + — - n log z
222 Equations containing an unknown permutation is zero. From the equation we find that /(z)= 1 +z-- = 0, z Thus, setting z = Qelfp,n < <p < n shows that 2rti J z In J_n For the sake of brevity, we let a = q sin <p + (q2 sin 2(p)/2 — ncp and write the integral in the form pq+q2/2 :f, Q+Q2/2 2ngn Since F{n) is real, we see that ee+Q2/2 F(n) = We choose e = q~3/4 and estimate the integral outside the ^-neighborhood of zero, as n —>• oo, taking into account that q = -sjn + 1/4 — 1/2 —>• oo. The integrand is even, so we only estimate the integral over cp, 0 < cp < 7r. It is convenient to consider the graphs of the functions cos <p and cos 2cp included in the exponent. With the help of the graphs presented in Figure 5.1.1, we can easily see that < r12 e-QH Js Js since 1 — cos 2s > e2 for sufficiently small e. Similarly, f /2* Jn [ tt/2 Jn/2 <P = -»-Q
5.1 A quadratic equation 223 tt/4, n/2 ¦-,.. 3tt/4 Figure 5.1.1. Graphs of cos <p and cos 2<p Thus F{n) = 2ttq" where e = q~3^'. Since q + q2 — n = 0, we find that, in a neighborhood of zero, a = q sin cp H sin 2cp — ncp = Qcp + Q2cp -n(p + O(e2\(p\3) = O(e2\(p\3), and therefore cosa = 1 + O(a2) = 1 + 0(<?V)- The exponent of the integrand can be represented in the domain of integration as follows: Q2 1 q{\ - coscp) + y A - cos2(p) = -(q + 2Q2)(p2 + O(g2<p4). Thus, for | (p | < s, Therefore f cosae^A-cos^^2A-cos2^2^ = f e^2(e+2^/2^(l + O(q-X/2)).
224 Equations containing an unknown permutation The change of variables 6 = <Jq + 2g2(p gives i r _r ¦ -eJq+Iq1 2Q2) since as x —>• oo, Combining the estimates gives F{n) = eQ+Q2/2 2n Qn J q + 2q2 It remains to substitute q = *Jn + 1/4 - 1/2 into this formula. Since i^/i) = t}2)/n!, we find that 2 Replace log«! by Stirling's formula logn\ = n logn — n H— logn It is easily seen that li + O(n 1). 1/2 E.1.5) E.1.6) E.1.7) E.1.8)
5.2 Equations of prime degree 225 When we use E.1.7), we find n\ogg = -/i log/i +/i log (l - 0I)) 2 \ /i og/i +/i log (l = + + 0I2)) 2 \ 2y/n Sn \nL)) = in,ogn-I^+o(-L). E.L9) Finally, M +— j ). E.1.10) forlogri2): By substituting estimates E.1.6)—E.1.10) into E.1.5), we obtain the final formula \ which implies the assertion of the theorem. \oZn + V« \ logV2 + O 5.2. Equations of prime degree According to D.3.4), the number an<R of permutations in Sn<R can be represented in the form UB^ Y ^f^^if + ...+^ = n}, E.2.1) N=l where x t> E-2-2) keR and ?[ ,... ,%N' are independent identically distributed random variables, and the positive parameter x can be chosen arbitrarily from the domain of conver- convergence of the series in E.2.2). If p is a prime number, then the number Tn of solutions of equation E.1.2) is an<R, where R = {1, p). Therefore XP BR(x)=x + —, P
226 Equations containing an unknown permutation and by E.2.1), " *^ Ml T = e ? x" *-^ Ml N=\ where ?> = ?1 + ¦ ¦ ¦ + ?w> ?i> • • •> %n are independent identically distributed random variables and ^ =P) = —^—-p- E-2.5) px + xP px + xP px Thus, to find the asymptotics of rM , it suffices to choose an appropriate value of x and to prove a local limit theorem for the sum %n = %i + • • • + %n- The summation of independent random variables taking two values is a simple problem that is solved by the de Moivre-Laplace theorem. Therefore the approach based on the representation E.2.4) seems more suitable here than the saddle-point method. We begin by applying this approach to the proof of Theorem 5.1.3. Proof of Theorem 5.1.3. If R = {1, 2}, then obviously X 2 ^ • B(x) 2 + x' l'x ' 2B{x) 2 + x' where B{x) = BR{x) = x +x2/2, and E?N = NE^i = N(x +x2)/B(x). In the main part of the sum in E.2.4), the parameter Af takes values close to B(x); therefore we choose x such that x + x = n. Hence, 1 1/4-1, B{x) = x + x__ = *L + I x + x2 n ^ x3 and D?i = 2«~1/2A + o(l)) as n —>• oo (where D denotes the variance). Let and divide the sum from E.2.4) into two parts so that E.2.6) n' eB(x) x"
5.2 Equations of prime degree 227 where = E N:\u\<A = E In the first sum, n' and by using the normal approximation to the Poisson distribution, we obtain, as n ->• oo, BN(x) uniformly in the integers N such that \u\ < A. The sum t;^ — N has the binomial distribution with N trials and the probability of success p{x) = x/B + x).\f\u\ < A, then A^ = B(x)(l +o(l)),and Np(x)(l - p(x)) = - ^ as n —>• oo. Therefore the normal approximation to the binomial distribution is valid. For |u\ < A = y/2\ogn, n - NE$X n(B(x) - N) Therefore, by the de Moivre-Laplace theorem, uniformly in the integers iV such that \u\ < A. The behavior of the functions cpi(N) = BN(x)e~B(x)/N\ and cp2(N) = n) is represented approximately in Figure 5.2.1. The sum S\ can be estimated as follows: = E N:\u\<A = E N:\u\<A 1 AH E
228 Equations containing an unknown permutation -A n/2 A Figure 5.2.1. The graphs ofcpi(N) and <p2(N) The last sum is an integral sum of the function e " I1 with step 2E(;c)D?i)~ so as n —>• oo, 1 1 r°° ^ 1 ¦?¦= ,.'„. .4=/ «-/2rf»d+ ¦- ' By virtue of monotonicity, for \u\ > A, P{^N=n}< and there exists a constant c such that Therefore 9 — V^ i Thus and by substituting this estimate into E.2.6), we obtain , B(x) It remains to substitute I 2'
5.2 Equations of prime degree 229 into the formula. It is easily seen that eB(x) =?f»A+oA xn = nn/2e~^/2{\+o{\)). Therefore B) _ » and Theorem 5.1.3 with the remainder term of the form 1 + o(l) is proved. ¦ We now turn to the case where p is a fixed prime number, p > 3, and consider the number Tn of solutions of equation E.1.2). Theorem 5.2.1. Ifn —>• oo and p is prime, p > 3, then Proof. The proof is almost the same as the proof of Theorem 5.1.3 given above and is also based on relation E.2.4). For R = {1, p}, B(x) = BR(x)=x+xp/p, and the independent random variables ?i, ...,?# in E.2.4) have the distribution = ii = JL. = -?L_. Pttl =P]= pxP xP B{x) px+xP B(x) px+xP We choose the parameter x such that x+xp = n. E.2.7) Then x = nl/P-- P B{x) = x+xP/p = - P p(x) = XP 1 pn px +xP = n/B(x), D^i = (p -
230 Equations containing an unknown permutation Let u = p{N - B(x)) A = y/2\ogn, and divide the sum in E.2.4) into two parts so that x' where E N:\u\<A 4= E BN(x) N! N:\u\>A In the first sum, N = B(x)(l + o(^B(x))) and uniformly in the integers A^ such that \u\ < A. Let ^* = (^ - \)/{p -\),i = \,...,N.Tht sum has the binomial distribution with A^ trials and the probability of success p(x) = xp/(px+xp) = 1 - pn-x+x'P + O(n as n -> oo. It is clear that = (n- N)/(p - 1)}, and if (n-N)/(p-\) is not an integer, then P{?N = n) = 0. Since E^i =n/B(x), B(x) =/i (n-N)/(p-\)-NE%* _ n-NEi;! _ n(B(x) - N) nu as n -> oo and \u\ < A, by using the de Moivre-Laplace theorem, we obtain 1 *.2 /n = n} = P{$ = {n- N)/{p - 1)} = uniformly in the integers N such that (n — N)/(p — 1) is an integer and \u\ < A.
5.2 Equations of prime degree 231 Therefore = E N:\u\<A = E P-\ f^z 2—i where the summation is over the integers N such that (n - N)/(p — 1) is an integer. The last sum is an integral sum of the function e~u /2 with step p(B(x)Di-i)~1/2. Since the summation is over N such that (n — N)/(p — 1) is an integer, that is, only each (p — l)th term is included in the sum, we obtain p-\ -uz/2 1 r°° -L / Therefore, as n —>• oo, For \u\ > A, p-\ and there exists a constant c such that and S2 < S = Thus S2 = 1 and by substituting this estimate into E.2.6), we obtain pxny/2nB(x) E.2.8) It is easily seen that xn = When we substitute these expressions into E.2.8), we obtain the assertion of Theorem 5.2.1. ¦
232 Equations containing an unknown permutation A slight refinement of the estimates used in the proof of Theorem 5.2.1 allows us to show that the assertion of the theorem is valid if p tends to infinity slowly, as specified below, where we prove a more general result. Theorem 5.2.2. If p is prime and n, p —> oo in such a way that p/n —> 0, then in \eJ ^ (m particular, if p~2nxlp —>• oo, ?/ze« andif p~lnl/p ->• 0, tfzen Tn{p) = (-) pV2 A+oA))) E.2.11) where m = n — p[n/p], and [c] is the integer part of c. Proof. The proof is similar to the proof of Theorem 5.2.1, but now we need to trace the effect of the parameter p in the remainder terms of the asymptotic formulas and to use a representation in terms of the Poisson probabilities instead of the representation E.2.4). It follows from the equation x+xp = n that under the conditions of the theorem, n2/p (n3/p \ n— + 0(^), E.2.12) np \n2P2J n (p-\)nl/p (nllp\ B = B(x) = - + -^ + 01 1 . E.2.13) P P \ np ) Therefore it is easy to confirm that p(x) = P{?! =p} = —ii- = 1 - pn-l+l'p + O(n~2+2/p). px + xp The random variable (f # - N)/(p — 1) can be represented in the form NrjN, p-\ where r]N has the binomial distribution with N trials and probability of success q = q(x) = 1 - p(x) = pn~l+l/p(l + O(n~l+l/p)). E.2.14) Therefore it is not difficult to see that for n = m + p[n/p], the probability
5.2 Equations of prime degree 233 = n) is nonzero if N = [n/p] + m + k(p~ 1), 0<k<[n/p], and for such N, where I = m + kp. Thus, the representation E.2.4) takes the form N=l k=0 This results in the representation T(p) _ n\ ^p {BqI Bq y»Ki - q,, -B(\-q) CS ? 1^ k=0 where / = m + /?&, A^ = [«//?] + m + &(/? — 1), m = « — /?[«//?]; and to obtain the basic assertion of the theorem, we must sum the products of two Poisson probabilities. Let s = v a = (n /P a = (n /PJp/n and divide s into two parts, \m-\-pk 51 = ^ (m + pk)\ k:\(N-B)b-V2\>a Note that a -> 0 under the conditions of the theorem, and the normal approxi- approximation to the second multiplier b E-2-16) is valid for all /, N such that \(N — B)B~l/2\ < a, and outside this region, (B(l-g))"-> ^_^_ (N-l)\ ~ V2^B where c is a constant.
234 Equations containing an unknown permutation It remains to show that .s'2 = o(s\) and ni/P)m+Pk _nl/p (¦+"(¦))• E-2.18) For the sake of brevity, we let b = Bq.lt follows from E.2.13) and E.2.14) that under the conditions of the theorem, ft = nx/p(\ + O(pn-{ + l/p)). E.2.19) It is clear that b[b]+P 51 - W since at least one of the summands with / from the interval ([ft], [ft] + p) is included in the sum s\. On the other hand, the summation over N > B + a*/B is the summation over /, with/ = m + pk such that / > b + ay/B + o(^/~B ). Let/o = b + a^/B + o(^/B). Then, ft ft2 \ bl°e~bl0 cbl° since ft/ Iq —>• 0. Therefore, — < s\ h(h - 1) • • • ([ft] + p + 1) c A + (/o - b)/b) •••(! + ([ft] - b + p + l)/ft) i 2 ~ (/o - ftK - (V where ci, C2, and C3 are constants. By the choice of a, the last bound tends to zero. This estimate, E.2.16), E.2.17), and E.2.19) imply E.2.18). Assertion E.2.9) follows from E.2.15), E.2.16), E.2.17), and E.2.18). If p~2nl/p —>• 00, then by using the normal approximation, we obtain oo —Trr*-'" = -A+0A)). pk)\ p This yields assertion E.2.10) of the theorem. Assertion E.2.11) follows from the fact that if p~lnl/p ->• 0, then OQ / 1 / n \ m -I- nlr jyi I p pk)\ m!
5.3 Equations of compound degree 235 5.3. Equations of compound degree In this section, we consider the number rw of solutions of the equation E.3.1) where d is a natural number, e is the identity permutation, and X is an unknown element of the symmetric group Sn. The cases where d is a prime number were considered in the previous sections. Let d be a compound number and let 1 = do < d\ < ¦ ¦ ¦ < dr = d be all different divisors of d. A permutation X is a solution of equation E.3.1) if and only if the lengths of cycles of X belong to the set {do,... ,dr}. Therefore !„ is equal to the number an>R of permutations in Sn<R, where R = {do, ... ,dr}. The following is a generalization of Theorems 5.1.3 and 5.2.1. Theorem 5.3.1. Ifn —>¦ oo and d is a fixed number, d > 2, then j\d 0+0A)) if d is odd, and ifd is even. Note that the summation in the above formulas is over the divisors j of the number d, and if we put d = 2 and d = /?, we obtain Theorem 5.1.3 and 5.2.1, respectively. Proof. Let 1 = do < d\ < ¦ ¦ ¦ < dr = d be all the divisors of d, R = {do,.. .,dr), keR and let ?i ,...,?# be independent identically distributed random variables, xk kB(x)' keR, E.3.2) where the positive parameter x can be chosen arbitrarily. Since d is compound, r > 2. Put ?at = ?i H h ?#. It is clear that = 0 x ')/*(*)•
236 Equations containing an unknown permutation We choose the parameter x such that x + xdi +¦¦¦ + xdr-1 + xd = n, E.3.3) and in what follows, we consider the random variables ?1 ?# with distribution E.3.2), where x is the solution of this equation. By iteration, it is not difficult to determine that xd = n - nd'-xtd nl/d + o(l) E.3.4) if d is odd, and x d = n - ndr-l/d nl/d + 1/2 + o(l) E.3.5) if d is even. Since 7^ ) = an,R, where R = {1, d\,..., dr-\,d), we can use the represen- representation E.2.1) and obtain ^ = n). E.3.6) X N=\ ¦ Therefore, to obtain the assertions of Theorem 5.3.1, it is sufficient to find the asymptotics of P{?# = n}. It is not difficult to see that B(xY B(x)(x + dxxdx + h dxd) - n: B{x) = r r 2/ d d M J where the summation is over the integers j, which are the divisors of d. In view of E.3.4) and E.3.5), 0+0A)), E-3-7) j\d J as n —>¦ oo. By estimating the second and third central moments of ?i and using the characteristic function of ?#, we can prove that the distribution of the random variable (?# — NE%\)/+JND%\ converges to the normal law with parameters @,1) as A^D^i —>• oo. If h is the maximal step of the lattice containing the set R, then the local limit theorem is valid on this lattice. We omit the proof of this local theorem.
5.3 Equations of compound degree 237 The remaining part of the proof of Theorem 5.3.1 repeats the corresponding part of the proof of Theorem 5.1.3 from Section 5.2. We put n-NE^ d(N-B(x)) ,-- »= —, i/= , ——, ^=2V21og«, and divide the sum from E.3.6) into two parts so that where *X ^ D \X) N:\u\<A S2= T ?^Me.k N:\u\>A It is easy to see that N = B(x)(l + o(l)) for \u\ < A = 2^/2logn and n(B(x)-N) _ uh,o(-i,2^ E3o, v — ——-—. t^ , — —w(i + U\n II, p.J.o; and by the local limit theorem, uniformly in the integers iV such that \u\ < A and (n — N)/h are integers. Recall that h is the maximal span of the distribution of ?i. As in the proof of Theorem 5.1.3, Section 5.2, we obtain S 'L 2 ,™ The last sum is an integral sum of the function e~u /z, with; and the summation is over N such that (« — N)/h are integers, that is, only each term is included in the sum. Since h and d are relatively prime, we see that 1 ^ hd ,.2/o 1 f°° and 1 In estimating S2, it will not be possible now to use the monotonicity of the tails of the function W2(N) = P{?# = n] as we did in the proof of Theorem 5.1.3 in
238 Equations containing an unknown permutation Section 5.2 (see Figure 5.2.1). By E.3.8), in the first sum, |u| < y/l\ogn for a sufficiently large n. Therefore, in the second sum, n:\v\>.j2\ogn By the integral limit theorem, E P{t;N=n} = I oo e~zl/2dz{\ '2\ogn and there exists a constant c such that, in the second sum, Thus, S\ + S2 = 5i(l + o(l)), and we obtain n\eB(x) E.3.9) This implies the assertions of the theorem because J j\d and x" can be represented in the cases of odd and even d as follows. Let d be odd, then according to E.3.4), jc = For 1 < j < d, and for j = d, Thus xj =nj/d xd = n-ndr-x/d n1/d =o(\). = exp and E 7 When we substitute the last expression into E.3.9), we obtain the first assertion of the theorem.
5.4 Notes and references 239 If d is even, we note that 2c/r_i = d and use E.3.5) to obtain xn = For 1 < / < dr-1, for j = d, and for j = dr-\, Thus = exp x~neB{x) =x~"/ J = nJ/d d =n-ndr-xld nl/d nj/d j\d J\d The substitution of the last expression into E.3.9) gives us the second assertion of the theorem. ¦ 5.4. Notes and references The study of equations of the form Xd = e in the symmetric group Sn is directly related to one of the significant characteristics of the elements of Sn: the order of permutations. By the order On(s) of a permutation s e Sn,we mean the least positive integer k such that sk is the identity permutation. The orders of elements in Sn vary from 1 to the maximal value G(n) over all s e Sn. E. Landau [95] shows that logGW i = 1. In spite of such a wide range of log On(s), the typical values of log On(s) are considerably less than log G(n) and are concentrated near 2 log2n. Let On be the order of a random permutation from Sn with uniform distribution. The following assertion is well known. Theorem 5.4.1. For any faced x, lim Pf (log a - 2 \og2n)/J3-1 log3 n 1 = —L= [* e~u2/2du.
240 Equations containing an unknown permutation The asymptotic normality of log On was first proved by P. Erdos and P. Turan [39]. Other proofs of Theorem 5.4.1 can be found in [106, 18, 27]. All the proofs are rather cumbersome and involve many analytical difficulties. From our point of view, the simplest proof, but still not a sufficiently simple one, is suggested in [78], where the approach based on the generalized scheme is used. It seems to us that investigating the numbers of solutions of equations of the form Xd = e could provide the basis for the study of the local behavior of On. Indeed, if p is prime, then T^ is just the number of permutations s e Sn whose order On(s) = p. Since the leading term of the asymptotics of the number !„ for p g yp a compound d is (n/e)"^~l^d\ almost all permutations counted by !„ probably have the order d. It would be of considerable interest to find the asymptotics of the local probabilities P{On = d} for d that lie in a neighborhood of exp{2-1 log2 n] and to see whether the integral limit theorem follows from these results in spite of the fact that the behavior of the probabilities P{On = d} is likely to be rather complicated. By virtue of the irregularity of the behavior of P{On = d}, this problem is not usually as trivial as is obtaining the integral limit theorem from the local theorem because now we have to obtain the local theorem for d of a specified form and, in addition, we have to know how many d of such a form exist. Theorems 5.1.1 and 5.1.2 for R = {1,2} and Theorem 5.1.3 were proved in [32]. Theorem 5.1.2 for/? = {1, p}, p > 2, was proved in [61], and for an arbitrary R in [33]. Theorem 5.1.3 was proved in [103], where the result of Theorem 5.2.1 was also presented. Assertion E.2.9) of Theorem 5.2.2 was proved by the saddle-point method in [144]. Theorem 5.3.1 was proved in [108, 145, 150] independently and almost simul- simultaneously. The approach based on the generalized scheme of allocation, presented in Chap- Chapter 5 of this book, was first published in [82], where the proof of Theorem 5.1.3 was realized with the help of this approach. The proof of Theorem 5.3.1 in Section 5.3 follows A. V. Kolchin [68], who, in addition, extended this theorem to the case d —>¦ oo such that d In In n/\nn —>¦ 0. The general conditions of existence of a solution of the equation Xd = a, where a is a fixed permutation and X is an unknown permutation from Sn, are given in [102]. The system of equations Ym\ y»«2 Ymk where k > 2, m \,..., ra^ are fixed natural numbers, X\,..., Xk e Sn, and e is the identity permutation in Sn, is considered in [110]. The asymptotic representation of the number of solutions X = (X\, ...,Xk) such that XtXj = XjX{ for all / ^ j is found.
BIBLIOGRAPHY [1] Sh. M. Agadzhanyan. On a general method of estimating the number of graphs from given classes. Avtomatika, (l):10—21, 1981. In Russian. [2] Sh. M. Agadzhanyan. The asymptotic formulae for the number of m- component graphs. Avtomatika, D):27-33, 1986. In Russian. [3] D. J. Aldous. Exchangability and related topics. Lecture Notes in Math., 1117:1-198, 1985. [4] D. J. Aldous. Brownian bridge asymptotics for random mappings. Adv. Appl. Probab., 24:763-764, 1992. [5] J. Arney and E. A. Bender. Random mappings with constraints on coales- coalescence. Pacific J. Math., 103:269-294, 1982. [6] R. A. Arratia. Independent process approximation for random combinato- combinatorial structures. Adv. Appl. Probab., 24:764-765, 1992. [7] R. Arratia and S. Tavare. Limit theorems for combinatorial structures via discrete process approximations. Random Structures and Algorithms, 3:321-345, 1992. [8] G. N. Bagaev. Distribution of the number of vertices in a component of an indecomposable mapping. Belorussian Acad. Sci. Dokl., 21A2):1061- 1063, 1977. In Russian. [9] G. N. Bagaev. Limit distributions of metric characteristics of an inde- indecomposable random mapping. In Combinatorial and Asymptotic Analysis, pp. 55-61. Krasnoyarsk Univ., Krasnoyarsk, 1977. In Russian. [10] G. N. Bagaev and E. F. Dmitriev. Enumeration of connected labelled bipar- bipartite graphs. Belorussian Acad. Sci. Dokl., 28:1061-1063,1984. In Russian. [11] G. V. Balakin. On random matrices. Theory Probab. Appl., 12:346-353, 1967. In Russian. [12] G. V. Balakin. The distribution of random matrices over a finite field. Theory Probab. Appl., 13:631-641, 1968. In Russian. 241
242 Bibliography [13] G. V. Balakin, V. I. Khokhlov, and V. F. Kolchin. Hypercycles in a random hypergraph. Discrete Math. Appl, 2:563-570, 1992. [14] A. D. Barbour. Refined approximations for the Ewens sampling formula. Adv. Appl. Probab., 24:765, 1992. [15] A. D. Barbour. Refined approximations for the Ewens sampling formula. Random Structures and Algorithms, 3:267-276, 1992. [16] E. A. Bender, E. R. Canfield, and B. D. McKay. The asymptotic number of labeled connected graphs with a given number of vertices and edges. Random Structures and Algorithms, 1:127-170, 1990. [17] E. A. Bender, E. R. Canfield, and B. D. McKay. Asymptotic properties of labeled connected graphs. Random Structures and Algorithms, 3:183-202, 1992. [18] M. R. Best. The distribution of some variables on a symmetric group. Nederl. Akad. Wetensch. Indag. Math. Proc, 73:385-402, 1970. [19] L. Bieberbach. Analytische Fortsetzung. Springer-Verlag, Berlin, 1955. [20] B. Bollobas. The evolution of random graphs. Trans. Amer. Math. Soc, 286:257-274, 1984. [21] B. Bollobas. Random Graphs. Academic Press, London, 1985. [22] Yu. V. Bolotnikov. Convergence to the Gaussian and Poisson processes of the variable \xr(n, n) in the classical occupancy problem. Theory Probab. Appl, 13:39-50, 1968. In Russian. [23] Yu. V. Bolotnikov. Convergence to the Gaussian process of the number of empty cells in the classical occupancy problem. Math. Notes, 4:97-103, 1968. In Russian. [24] Yu. V. Bolotnikov. Limit processes in a non-equiprobable scheme of al- allocating particles into cells. Theory Probab. Appl, 13:534-542, 1968. In Russian. [25] Yu. V. Bolotnikov. On some classes of random variables on cycles of permutations. Math. USSRSb., 36:87-99, 1980. [26] Yu. V. Bolotnikov, V. N. Sachkov, and V. E. Tarakanov. Asymptotic nor- normality of some variables connected with the cyclic structure of random permutations. Math. USSR Sb., 28:107-117, 1976. [27] J. D. Bovey. An approximate probability distribution for the order of ele- elements of the symmetric group. Bull. London Math. Soc, 12:41-46, 1980. [28] V. E. Britikov. Limit theorems on the maximum size of trees in a random forest of non-rooted trees. In Probability Problems of Discrete Mathemat- Mathematics, pp. 84-91. MIEM, Moscow, 1987. In Russian. [29] V. E. Britikov. The asymptotic number of forests from unrooted trees. Math. Notes, 43:387-394, 1988. [30] V. E. Britikov. The limit behaviour of the number of trees of a given size in a random forest of nonrooted trees. In Stochastic Processes and Applications, pp. 36-41. MIEM, Moscow, 1988. In Russian.
Bibliography 243 [31] I. A. Cheplyukova. Emergence of the giant tree in a random forst. Discrete Math. AppL, 8A): 17-34, 1998. [32] S. Chowla, I. N. Herstein, and K. Moore. On recursions connected with symmetric groups. Canad. J. Math., 3:328-334, 1951. [33] S. Chowla, I. N. Herstein, and W. R. Scott. The solution of xd = 1 in symmetric groups. Norske Vid. Selsk., 25:29-31, 1952. [34] J. M. DeLaurentis and B. G. Pittel. Random permutations and Brownian motion. Pacific J. Math., 119:287-301, 1985. [35] P. J. Donnelly. Labellings, size-biased permutations and the gem distribu- distribution. Adv. Appl. Probab., 24:766, 1992. [36] P. J. Donnelly, W. J. Ewens, and S. Padmadisastra. Functionals of random mappings: Exact and asymptotic results. Adv. Appl. Probab., 23:437-455, 1991. [37] P. Erdos and A. Renyi. On the evolution of random graphs. Publ. Math. Inst. Hungarian Acad. Sci., Ser. A, 5A-2): 17-61, 1960. [38] P. Erdos and A. Renyi. On random matrices. Magyar Tud. Akad. Mat. Kutatolnt. Kozl, 8:455-461, 1963. [39] P. Erdos and P. Turan. On some problems of statistical group theory, iii. ActaMath. Acad. Hungar, 18C-4):309-320, 1967. [40] W. J. Ewens. The sampling theory of selectively neutral alleles. Theoret. Pop. Biol., 3:87-112, 1972. [41] W. J. Ewens. Sampling properties of random mappings. Adv. Appl. Probab., 24:773, 1992. [42] M. V. Fedoryuk. Saddle Point Method. Nauka, Moscow, 1977. In Russian. [43] W. Feller. An Introduction to Probability Theory and Its Applications, vol. 2. Wiley, New York, 1966. [44] P. Flajolet. The average height of binary trees and other simple trees. Journal of Computer and System Sciences, 25:171-213, 1982. [45] P. Flajolet. Random tree models in the analysis of algorithms. In P.-J. Cour- tois and G. Latouche, editors, Performance'87, pp. 171-187. North- Holland, Amsterdam, 1988. [46] P. Flajolet, D. E. Knuth, and B. Pittel. The first cycles in an evolving graph. Discrete Math., 75:167-215, 1989. [47] P. Flajolet and A. M. Odlyzko. Random mapping statistics. In J.-J. Quis- quarter and J. Vandewalle, editors, Advances in Cryptology, Lecture Notes in Computer Science, Vol. 434, pp. 329-354. Springer-Verlag, Berlin, 1990. [48] P. Flajolet and M. Soria. Gaussian limiting distributions for the number of i I components in combinatorial structures. J. Combinatorial Theory, Series \ A, 53:165-182, 1990. | [49] B. V. Gnedenko and A. N. Kolmogorov. Limit Distributions for Sums of i Independent Random Variables. Addison-Wesley, Reading, MA, 1949.
244 Bibliography [50] S. W. Golomb. Shift Register Sequences. Aegean Park Press, Laguna Hills, CA, 1982. [51] V. L. Goncharov. On the distribution of cycles in permutations. Soviet Math. Dokl, 35(9):299-301, 1942. In Russian. [52] V. L. Goncharov. On the alternation of events in a sequence of Bernoulli trials. Soviet Math. Dokl, 36(9):295-297, 1943. In Russian. [53] V. L. Goncharov. On the field of combinatorics. Soviet Math. Izv., Sen Math., 8:3-48, 1944. In Russian. [54] A. A. Grusho. Random mappings with bounded multiplicity. Theory Probab. Appl, 17:416-425, 1972. [55] A. A. Grusho. Distribution of the height of mappings of bounded multiplic- multiplicity. In Asymptotic and Enumerative Problems of Combinatorial Analysis, pp. 7-18. Krasnoyarsk Univ., Krasnoyarsk, 1976. In Russian. [56] J. C. Hansen. Order statistics for random combinatorial structures. Adv. Appl. Probab., 24:774, 1992. [57] B. Harris. Probability distributions related to random mappings. Ann. Math. Statist., 31:1045-1062, 1960. [58] C. C. Heyde. A contribution to the theory of large deviations for sums of independent random variables. Z Wahrscheinlichkeitstheorie undverw. Gebiete, 7:303-308, 1967. [59] W. Hoeffding. Probability inequalities for sums of bounded random vari- variables. J. Amer. Statist. Assoc, 58C01): 13-30, 1963. [60] I. A. Ibragimov and Yu. V. Linnik. Independent and Stationary Related Variables. Nauka, Moscow, 1965. In Russian. [61] E. Jacobstal. Sur le nombre d'elements du group symmetric Sn dont l'ordre est un nombre premier. Norske Vid. Selsk., 21:49-51, 1949. [62] S. Janson. Multicyclic components in a random graph process. Random Structures and Algorithms, 4:71-84, 1993. [63] S. Janson, D. E. Knuth, T. Luczak, and B. Pittel. The birth of the giant component. Random Structures and Algorithms, 4:233-358, 1993. [64] I. B. Kalugin. The number of cyclic points and the height of a random mapping with constraints on multiplicities of the vertices. In Abstracts of the All-Union Conference Probab. Methods in Discrete Math., pp. 35-36. Karelian Branch of the USSR Acad. Sci., Petrozavodsk, 1983. In Russian. [65] V. I. Khokhlov. On the structure of a non-uniformly distributed random graph. Adv. Appl. Probab., 24:775, 1992. [66] V. I. Khokhlov and V. F. Kolchin. On the structure of a random graph with nonuniform distribution. In New Trends in Probab. and Statist., pp. 445- 456. VSP/Mokslas, Utrecht, 1991. [67] J. F. C. Kingman. The population structure associated with the Ewens sampling formula. Theoret. Pop. Biol, 11:274-284, 1977.
Bibliography 245 [68J A. V. Kolchin. Equations in unknown permutations. Discrete Math. Appl., 4:59-71, 1994. [69] V. F. Kolchin. A class of limit theorems for conditional distributions. Litovsk. Mat. Sb., 8:53-63, 1968. In Russian. [70J V. F. Kolchin. On the limiting behavior of extreme order statistics in a polynomial scheme. Theory Probab. Appl., 14:458-469, 1969. [71] V. F. Kolchin. A problem of allocating particles into cells and cycles of random permutations. Theory Probab. Appl., 16:74-90, 1971. [72] V. F. Kolchin. A problem of the allocation of particles in cells and random mappings. Theory Probab. Appl., 21:48-63, 1976. [73] V. F. Kolchin. Branching processes, random trees, and a generalized scheme of arrangements of particles. Math. Notes, 21:386-394, 1977. [74] V. F. Kolchin. Moment of degeneration of a branching process. Math. Notes, 24:954-961, 1978. [75] V. F. Kolchin. Branching processes and random trees. In Cybernetics, Combinatorial Analysis and Graph Theory, pp. 85-97. Nauka, Moscow, 1980. In Russian. [76] V. F Kolchin. Asymptotic Methods of Probability Theory. MIEM, Moscow, 1984. In Russian. [77] V. F Kolchin. On the behavior of a random graph near a critical point. Theory Probab. Appl., 31:439-451, 1986. [78] V. F Kolchin. Random Mappings. Optimization Software, New York, 1986. [79] V. F Kolchin. Systems of Random Equations. MIEM, Moscow, 1988. In Russian. [80] V. F. Kolchin. On the number of permutations with constraints on their cycle lengths. Discrete Math. Appl., 1:179-194, 1991. [81] V. F. Kolchin. Cycles in random graphs and hypergraphs. Adv. Appl. Probab., 24:768, 1992. [82] V. F. Kolchin. The number of permutations with cycle lengths from a fixed set. In Random Graphs'89, pp. 139-149. Wiley, New York, 1992. [83] V. F. Kolchin. Consistency of a system of random congruences. Discrete Math. Appl., 3:103-113, 1993. [84] V. F. Kolchin. A classification problem in the presence of measurement errors. Discrete Math. Appl., 4:19-30, 1994. [85] V. F. Kolchin. Random graphs and systems of linear equations in finite fields. Random Structures and Algorithms, 5:135-146, 1994. [86] V. F. Kolchin. Systems of random linear equations with small number of non-zero coefficients in finite fields. In Probabilistic Methods in Discrete Mathematics, pp. 295-304. VSP, Utrecht, 1997.
246 Bibliography [87] V. F. Kolchin and V. I. Khokhlov. An allocation problem and moments of the binomial distribution. In Probab. Problems of Discrete Math., pp. 16- 21. MIEM, Moscow, 1987. In Russian. [88] V. F. Kolchin and V. I. Khokhlov. On the number of cycles in a random non-equiprobable graph. Discrete Math. Appl, 2:109-118, 1992. [89] V. F. Kolchin and V. I. Khokhlov. A threshold effect for systems of random equations of a special form. Discrete Math. Appl, 5:425-436, 1995. [90] V. F. Kolchin, B. A. Sevastyanov, and V. P. Chistyakov. Random Allocations. Wiley, New York, 1978. [91] I. N. Kovalenko. A limit theorem for determinants in the class of Boolean functions. Soviet Math. Dokl, 161:517-519, 1965. In Russian. [92] I. N. Kovalenko. On the limit distribution of the number of solutions of a random system of linear equations in the class of Boolean functions. Theory Probab. Appl, 12:51-61, 1967. In Russian. [93] I. N. Kovalenko, A. A. Levitskaya, and M. N. Savchuk. Selected Problems of Probabilistic Combinatorics. Naukova Dumka, Kiev, 1986. In Russian. [94] J. B. Kruskal. The expected number of components under a random map- mapping function. Amer. Math. Monthly, 61:392-397, 1954. [95] E. Landau. Handbuch der Lehre von der Verteilung der Primzahlen, vol. 1. Teubner, Berlin, 1909. [96] A. A. Levitskaya. Theorems on invariance of the limit behaviour of the number of solutions of a system of random linear equations over a finite ring. Cybernetics, B): 140-141, 1978. In Russian. [97] A. A. Levitskaya. Theorems on invariance for the systems of random linear equations over an arbitrary finite ring. Soviet Math. Dokl, 263:289-291, 1982. In Russian. [98] A. A. Levitskaya. The probability of consistency of a system of random linear equations over a finite ring. Theory Probab. Appl, 30:339-350, 1985. In Russian. [99] T. Luczak. Component behaviour near the critical point of the random graph process. Random Structures and Algorithms, 1:287-310, 1990. [100] T. Luczak. Cycles in a random graph near the critical point. Random Structures and Algorithms, 2:421-439, 1991. [101] T. Luczak and B. Pittel. Components of random forests. Comb. Probab. andComput., 1:35-52, 1992. [102] M. P. Mineev and A. I. Pavlov. On the number of permutations of a special form. Math. USSR Sb., 99:468-476, 1976. In Russian. [103] L. Moser and M. Wyman. On the solution of xd = 1 in symmetric groups. Canad. J. Math., 7:159-168, 1955. [104] L. R. Mutafchiev. Local limit theorems for sums of power series distributed random variables and for the number of components in labelled relational structures. Random Structures and Algorithms, 3:403-426, 1992.
Bibliography 247 [105] E. Palmer. Graphical Evolution. Wiley, New York, 1985. [106] A. I. Pavlov. On the limit distribution of the number of cycles and the logarithm of the order of a class of permutations. Math. USSR Sb., 42:539- 567, 1982. [ 107] A. I. Pavlov. On the number of cycles and the cycle structure of permutations from some classes. Math. USSR Sb., 46:536-556, 1984. [108] A. I. Pavlov. On the permutations with cycle lengths from a fixed set. Theory Probab. Appl., 31:618-619, 1986. In Russian. [ 109] A.I. Pavlov. Local limit theorems for the number of components of random substitutions and mappings. Theory Probab. Appl., 33:196-200, 1988. In Russian. [110] A. I. Pavlov. The number and cycle structure of solutions of a system of equations in substitutions. Discrete Math. Appl., 1:195-218, 1991. [Ill] Yu. L. Pavlov. The asymptotic distribution of maximum tree size in a random forest. Theory Probab. Appl., 22:509-520, 1977. [112] Yu. L. Pavlov. Limit theorems for the number of trees of a given size in a random forest. Math. USSRSb., 32:335-345, 1977. [113] Yu. L. Pavlov. A case of limit distribution of the maximum size of a tree in a random forest. Math. Notes, 25:387-392, 1979. [114] Yu. L. Pavlov. Limit distributions of some characteristics of random map- mappings with a single cycle. In Math. Problems of Modelling Complex Sys- Systems, pp. 48-55. Karelian Branch of the USSR Acad. Sci., Petrozavodsk, 1979. In Russian. [115] Yu. L. Pavlov. Limit theorems for a characteristic of a random mapping. Theory Probab. Appl., 27:829-834, 1981. [116] Yu. L. Pavlov. Limit distributions of the height of a random forest. Theory Probab. Appl., 28:471-480, 1983. [117] Yu. L. Pavlov. On the random mappings with constraints on the number of cycles. In Proc. Steklov Inst. Math., pp. 131-142. Nauka, Moscow, 1986. [118] Yu. L. Pavlov. Some properties of plane planted trees. In Abstr. All-Union Conference on Discrete Math, and its Appl. to Modelling of Complex Sys- Systems, p. 14. Irkutsk State Univ., Irkutsk, 1991. In Russian. [119] Yu. L. Pavlov. Some properties of planar planted trees. Discrete Math. Appl., 3:97-102, 1993. [120] Yu. L. Pavlov. The limit distributions of the maximum size of a tree in a random forest. Discrete Math. Appl., 5:301-316, 1995. [121] Yu. L. Pavlov. Limit distributions of the number of trees of a given size in a random forest. Discrete Math. Appl., 6:117-133, 1996. [122] V. V. Petrov. Sums of Independent Random Variables. Springer-Verlag, New York, 1975. [123] B. Pittel. On tree census and the giant component in sparse random graphs. Random Structures and Algorithms, 1:311-342, 1990.
248 Bibliography [ 124] G. P61ya and G. Szego. Aufgaben undLehrsatze aus derAnalysis. Springer- Verlag, Berlin, 1925. [125] Yu. V. Prokhorov. Asymptotic behaviour of the binomial distribution. Us- pekhiMatem. Nauk, 8C): 135-142, 1953. In Russian. [126] J. Riordan. Combinatorial Identities. Wiley, New York, 1968. [127] A. Ruciriski and N. C. Wormald. Random graph processes with degree restrictions. Combinatorics, Probability and Computing, 1:169-180,1992. [128] V. N. Sachkov. Mappings of a finite set with restraints on contours and height. Theory Probab. Appl., 17:640-656, 1972. [ 129] V. N. Sachkov. Random mappings with bounded height. Theory Probab. Appl., 18:120-130, 1973. [130] V. N. Sachkov. Probability Methods in Combinatorial Analysis. Nauka, Moscow, 1978. In Russian. [131] A. I. Saltykov. The number of components in a random bipartite graph. Discrete Math. Appl., 5:515-523, 1995. [132] B. A. Sevastyanov. Convergence of the number of empty cells in the classi- classical allocation problems to Gaussian and Poisson processes. Theory Probab. Appl., 12:144-154, 1967. In Russian. [133] V. E. Stepanov. On the probability of connectedness of a random graph gm(t). Theory Probab. Appl., 15:55-67, 1970. [134] V. E. Stepanov. Phase transition in random graphs. Theory Probab. Appl., 15:187-203, 1970. [135] V. E. Stepanov. Structure of random graphs gn(x \ h). Theory Probab. Appl., 17:227-242, 1972. [136] L. Takacs. On the height and widths of random rooted trees. Adv. Appl. Probab., 24:771, 1992. [ 137] S. G. Tkachuk. Local limit theorems on large deviations in the case of stable limit laws, hvestiya of Uzbek Academy of Sciences, B):30-33, 1973. In Russian. [ 138] V. A. Vatutin. Branching processes with final types of particles and random trees. Adv. Appl. Probab., 24:771, 1992. [139] A. M. Vershik and A. A. Shmidt. Symmetric groups of high degree. Soviet Math. Dokl, 13:1190-1194, 1972. [140] A. M. Vershik and A. A. Shmidt. Limit measures arising in the asymptotic theory of symmetric groups, i. Theory Probab. Appl, 22:78-85, 1977. [141] A. M. Vershik and A. A. Shmidt. Limit measures arising in the asymptotic theory of symmetric groups, ii. Theory Probab. Appl, 23:36-49, 1978. [142] V A. Voblyi. Asymptotic enumeration of labelled connected sparse graphs with a given number of planted vertices. Discrete Analysis, 42:3-16,1985. In Russian. [143] V. A. Voblyi. Wright and Stepanov-Wright coefficients. Math. Notes, 42:969-974, 1987.
Bibliography 249 [ 144J L. M. Volynets. The number of solutions of an equation in the symmetric group. In Probab. Processes and AppL, pp. 104-109. MIEM, Moscow, 1985. In Russian. [145] L. M. Volynets. On the number of solution of the equation xs = e in the symmetric group. Math. Notes, 40:155-160, 1986. In Russian. [ 146] L. M. Volynets. An estimate of the rate of convergence to the limit distribu- distribution for the number of cycles in a random substitution. In Probab. Problems of Discrete Math., pp. 40-46. MIEM, Moscow, 1987. In Russian. [147] L. M. Volynets. The generalized scheme of allocation and the distribu- distribution of the number of cycles in a random substitution. In Abstracts of the Second All-Union Conf. Probab. Methods of Discrete Math., pp. 27-28. Petrozavodsk, 1988. In Russian. [148] L. M. Volynets. The generalized scheme of allocation and the number of cycles in a random substitution. In Probab. Problems of Discrete Math., pp. 131-136. MIEM, Moscow, 1988. In Russian. [149] L. M. Volynets. An example of a nonstandard asymptotics of the number of substitutions with restrictions on the cycle lengths. In Probab. Processes and AppL, pp. 85-90. MIEM, Moscow, 1989. In Russian. [150] H. Wilf. The asymptotics of ep^ and the number of elements of each order in Sn. Bull. Amer. Math. Soc, 15:228-232, 1986. [151] E. M. Wright. The number of connected sparsely edged graphs, iii. J. Graph Theory, 4:393-407, 1980. [152] E. M. Wright. The number of connected sparsely edged graphs, iv. J. Graph Theory, 7:219-229, 1983. [153] A. L. Yakymiv. On the distribution of the number of cycles in random a-substitutions. In Abstracts of the Second All-Union Conference Probab. Methods in Discrete Math., p. 111. Karelian Branch of the USSR Acad. Sci., Petrozavodsk, 1988. In Russian. [154] A. L. Yakymiv. Substitutions with cycle lengths from a fixed set. Discrete Math. AppL, 1:105-116, 1991. [155] A. L. Yakymiv. Some classes of substitutions with cycle lengths from a given set. Discrete Math. AppL, 3:213-220, 1993. [156] N. Zierler. Linear recurring sequences. J. Soc. Ind. AppL Math., 7:31-48, 1959.
INDEX algorithm A2, 173 characteristic function, 8 classical scheme of allocation, 16 complete description of distribution of the number of cycles, 192 connected component, 23 connectivity, 22 critical graph, 91 critical set, 125 decomposable property, 23 distribution function, 2 equations involving permutations, 219 equations of compound degree, 235 equations of prime degree, 225 factorial moment, 3 feedback point, 124 forest, 21 forest of nonrooted trees, 30 generalized scheme of allocation, 14 generating function, 6 graphs with components of two types, 70 homogeneous system of equations, 125 hypercycle, 126 independent critical sets, 125 inversion formula, 10 length of the maximum cycle, 212 limit distribution of the number of hypercycles, 164 linearly independent solutions, 125 local limit theorem, 10 mathematical expectation, 3 maximal span, 9 maximum number of independent critical sets, 125, 135 maximum size of components, 66 maximum size of components of a random graph, 84 maximum size of trees in a random forest, 48 maximum size of trees in a random graph, 83 maximum size of trees in a random graph from An, t, 71 mean, 3 mean number of solutions in the equiprobable case, 132 method of coordinate testing, 168 multinomial distribution, 15 multiplicity of a vertex in a set of hyperedges, 126
252 Index nonequiprobable graph, 109 normal distribution, 9 number of components, 107, 144 number of components in Un, 65 number of cycles in a random permutation, 182, 183 number of cycles of length r in a random permutation, 182 number of forests, 31 number of linearly independent solutions, 130 number of nontrivial solutions, 131 number of trees of fixed sizes, 71 number of trees with r vertices, 42 number of unicyclic components, 77,81 number of unicyclic graphs, 58 number of vertices in the maximal unicyclic component, 81 number of vertices in unicyclic components, 71, 77 order of random permutation, 239 order statistics, 17 partition, 19 permutations with restrictions on cycle lengths, 192 Poisson distribution, 6 probability, 1 probability distribution, 2 probability of consistency, 144 probability of reconstruction of the true solution, 166 probability space, 1 problem of moments, 4 process of sequential growth of the number of rows, 127 random element, 1 random forest, 21, 30 random graph corresponding to random permutation, 181 random graph of a random permutation, 182 random graphs with independent edges, 100 random matrices with independent elements, 126 random pairwise comparisons, 164 random partitions, 30 random permutation, 28 random variable, 1 rank of matrix, 124 rank of random sparse matrices, 135 reconstructing the true solution, 165 saddle-point method, 7, 221 set of rooted trees, 21 shift register, 123 simple classification problem, 122 single-valued mapping, 18 statistical problem of testing the hypotheses Hq and H\, 180 subcritical graph, 91 summation of independent random variables in GFB), 131 supercritical graph, 91 system of linear equations in GFB), 122 system of random equations with distorted right-hand sides, 180 system with at most two unknowns in each equation, 156 threshold property, 156 total number of components, 24 total number of critical sets, 157 total number of cycles, 102, 212 total number of hypercycles, 158 unicyclic graph, 58 voting algorithm, 165 weak convergence, 2