Random Combinatorial Structures and Prime Factorizations

Introduction Many combinatorial structures decompose into components, with the list of component sizes carrying substantial information. An integer factors into primes—this is a similar situation, but different in that the list of sizes of factors carries all the information for identifying the integer. The combinatorial structures to keep in mind include permutations, mappings from a finite set into itself, polynomials over finite fields, partitions of an integer, partitions of a set, and graphs. The similar behavior of prime factorization and cycle decompositions of permutations was observed by Knuth and Trabb Pardo [24]. We attempt to explain why such systems are similar. We are interested in probability models which pick “a random combinatorial structure of size n”, meaning that each of the objects of that size is equally likely. We also consider the model which picks an integer uniformly from 1 to n. Such models lead to stochastic processes that count the number of components of each conceivable size. What are the common features of these processes? There are two broad areas of commonality. The first and most basic is essentially an algebraic property. It involves representing the distribution of the combinatorial process as that of a sequence of independent but not identically distributed random variables, conditioned on a weighted sum; see (5). All our combinatorial examples satisfy this exactly. On the other hand, prime factorizations of a uniformly chosen integer cannot be described in terms of conditioning a process of independent random variables on the value of a weighted sum because the value of the weighted sum in this case tells us the value of the random integer. However, by considering conditioning as a special case of the more general construction of “biasing” a distribution, we can view prime factorization as having a very close relative of the conditioning property. Conditioning independent random variables on various weighted sums has a long history; for combinatorial examples we refer the reader to Shepp and Lloyd [29], Holst [20], Kolchin [25], Diaconis and Pitman [13], and Arratia and Tavare [9]. The second broad area of commonality, shared by some but not all of the examples listed above, is an analytic property. The number of components of size at most x has, for fixed x, a limit in distribution as n →∞, and the expected value of this limit is asymptotic to θ logx as x→∞ , where θ > 0 is a constant. We call combinatorial structures that have this property “logarithmic”. For the main examples in this paper the logarithmic structures are permutations, polynomials, mappings, the Ewens sampling formula, and prime factorizations, and the nonlogarithmic structure is that of integer partitions. Richard Arratia is professor of mathematics at the University of Southern California. His e-mail address is rarratia@math.usc.edu.

[1]  Simon Tavaré,et al.  Total Variation Asymptotics for Poisson Process Approximations of Logarithmic Combinatorial Assemblies , 1995 .

[2]  Philippe Flajolet,et al.  Gaussian limiting distributions for the number of components in combinatorial structures , 1990, J. Comb. Theory, Ser. A.

[3]  G. Sankaranarayanan,et al.  Ordered cycle lengths in a random permutation. , 1971 .

[4]  Jennie C. Hansen,et al.  How random is the characteristic polynomial of a random matrix , 1993 .

[5]  Jennie C. Hansen,et al.  Order Statistics for Decomposable Combinatorial Structures , 1994, Random Struct. Algorithms.

[6]  Carol Bult,et al.  PERMUTATIONS , 1994 .

[7]  Simon Tavaré,et al.  A Rate for the Erdös-Turán Law , 1994, Comb. Probab. Comput..

[8]  J. Kubilius,et al.  Probabilistic Methods in the Theory of Numbers , 1964 .

[9]  R. Arratia,et al.  The Cycle Structure of Random Permutations , 1992 .

[10]  Patrick Billingsley,et al.  On the distribution of large prime divisors , 1972 .

[11]  G. A. Watterson The stationary distribution of the infinitely-many neutral alleles diffusion model , 1976 .

[12]  Philippe Flajolet,et al.  Random Mapping Statistics , 1990, EUROCRYPT.

[13]  W. Ewens The sampling theory of selectively neutral alleles. , 1972, Theoretical population biology.

[14]  Boris Pittel,et al.  On a Likely Shape of the Random Ferrers Diagram , 1997 .

[15]  Simon Tavaré,et al.  Limit Theorems for Combinatorial Structures via Discrete Process Approximations , 1992, Random Struct. Algorithms.

[16]  R. Arratia,et al.  Poisson Process Approximations for the Ewens Sampling Formula , 1992 .

[17]  W. Vervaat,et al.  Success epochs in Bernoulli trials (with applications in number theory) , 1973 .

[18]  G. Tenenbaum Introduction to Analytic and Probabilistic Number Theory , 1995 .

[19]  Boris G. Pittel,et al.  Random Set Partitions: Asymptotics of Subset Counts , 1997, J. Comb. Theory, Ser. A.

[20]  J. Kingman Random Discrete Distributions , 1975 .

[21]  Bert Fristedt,et al.  The structure of random partitions of large integers , 1993 .

[22]  N. L. Johnson,et al.  Discrete Multivariate Distributions , 1998 .

[23]  L. Holst A UNIFIED APPROACH TO LIMIT THEOREMS FOR URN MODELS , 1979 .

[24]  G. A. Watterson THE STATIONARY DISTRIBUTION OF THE INFINITELY-MANY , 1976 .

[25]  Donald E. Knuth,et al.  Analysis of a Simple Factorization Algorithm , 1976, Theor. Comput. Sci..

[26]  Simon Tavare,et al.  Independent Process Approximations for Random Combinatorial Structures , 1994, 1308.3279.