Self-Similarity in Population Dynamics: Surname Distributions and Genealogical Trees

The frequency distribution of surnames turns out to be a relevant issue not only in historical demography but also in population biology, and especially in genetics, since surnames tend to behave like neutral genes and propagate like Y chromosomes. The stochastic dynamics leading to the observed scale-invariant distributions has been studied as a Yule process, as a branching phenomenon and also by field-theoretical renormalization group techniques. In the absence of mutations the theoretical models are in good agreement with empirical evidence, but when mutations are present a discrepancy between the theoretical and the experimental exponents is observed. Hints for the possible origin of the mismatch are discussed, with some emphasis on the difference between the asymptotic frequency distribution of a full population and the frequency distributions observed in its samples. A precise connection is established between surname distributions and the statistical properties of genealogical trees. Ancestors tables, being obviously self-similar, may be investigated theoretically by renormalization group techniques, but they can also be studied empirically by exploiting the large online genealogical databases concerning European nobility.

[1]  Beom Jun Kim,et al.  Distribution of Korean family names , 2004, cond-mat/0407311.

[2]  Paolo Rossi Surname distribution in population genetics and in statistical physics. , 2013, Physics of life reviews.

[3]  N. Goldenfeld Kinetics of a model for nucleation-controlled polymer crystal growth , 1984 .

[4]  Joseph T. Chang Recent common ancestors of all present-day individuals , 1999, Advances in Applied Probability.

[5]  W. Reed,et al.  From gene families and genera to incomes and internet file sizes: why power laws are so common in nature. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[7]  Surnames as markers of inbreeding and migration. , 1983, Human biology.

[8]  M. Newman Power laws, Pareto distributions and Zipf's law , 2005 .

[9]  B. Derrida,et al.  Evolution in a flat fitness landscape , 1991 .

[10]  G. Yule,et al.  A Mathematical Theory of Evolution, Based on the Conclusions of Dr. J. C. Willis, F.R.S. , 1925 .

[11]  J. Crow,et al.  Measurement of inbreeding from the frequency of marriages between persons of the same surname. , 1965, Eugenics quarterly.

[12]  Renormalization group evaluation of exponents in family name distributions , 2009, 0902.2248.

[13]  Bak,et al.  Punctuated equilibrium and criticality in a simple model of evolution. , 1993, Physical review letters.

[14]  Bernard Derrida,et al.  Statistical properties of valleys in the annealed random map model , 1988 .

[15]  Beom Jun Kim,et al.  Family name distributions: master equation approach. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[16]  M. Doi Second quantization representation for classical many-particle system , 1976 .

[17]  R. Song,et al.  Frequency distributions from birth, death, and creation processes. , 2002, Bio Systems.

[18]  Tang,et al.  Self-Organized Criticality: An Explanation of 1/f Noise , 2011 .

[19]  J. McGregor,et al.  The number of mutant forms maintained in a population , 1967 .

[20]  William J. Reed,et al.  On the distribution of family names , 2003 .

[21]  Douglas L. T. Rohde,et al.  Modelling the recent common ancestry of all living humans , 2004, Nature.

[22]  Bernard Derrida,et al.  Distribution of repetitions of ancestors in genealogical trees , 1999 .

[23]  B Derrida,et al.  On the genealogy of a population of biparental individuals. , 2000, Journal of theoretical biology.

[24]  George H Darwin,et al.  Marriages between first cousins in England and their effects. , 1875, International journal of epidemiology.

[25]  D. Zanette,et al.  Vertical transmission of culture and the distribution of family names. , 2000, nlin/0009046.

[26]  F. Galton,et al.  On the Probability of the Extinction of Families , 1875 .

[27]  R. Fisher,et al.  The Relation Between the Number of Species and the Number of Individuals in a Random Sample of an Animal Population , 1943 .

[28]  H. Simon,et al.  ON A CLASS OF SKEW DISTRIBUTION FUNCTIONS , 1955 .

[29]  L. Peliti Path integral approach to birth-death processes on a lattice , 1985 .

[30]  M. E. J. Newman,et al.  Power laws, Pareto distributions and Zipf's law , 2005 .

[31]  J. Bashford,et al.  Path integral formulation and Feynman rules for phylogenetic branching models , 2004, q-bio/0411047.

[32]  Flyvbjerg,et al.  Mean field theory for a simple model of evolution. , 1993, Physical review letters.

[33]  Maurizio Serva,et al.  A statistical model of an evolving population with sexual reproduction , 1991 .

[34]  Jackson,et al.  Simple model of self-organized biological evolution. , 1994, Physical review letters.

[35]  E. Devor Surnames and genetic structure. , 1986 .

[36]  D. Zanette,et al.  At the boundary between biological and cultural evolution: the origin of surname distributions. , 2002, Journal of theoretical biology.

[37]  Susanna C. Manrubia,et al.  STATISTICAL PROPERTIES OF GENEALOGICAL TREES , 1999, cond-mat/9902033.

[38]  D. Pettener,et al.  General Method to Unravel Ancient Population Structures through Surnames, Final Validation on Italian Data , 2012, Human biology.

[39]  G. Lasker A coefficient of relationship by isonymy: a method for estimating the genetic relationship between populations. , 1977, Human biology.

[40]  Nadav M Shnerb,et al.  Universal features of surname distribution in a subsample of a growing population. , 2009, Journal of theoretical biology.

[41]  W. R. Fox,et al.  The Distribution of Surname Frequencies , 1983 .