Ancestral population genomics using coalescence hidden Markov models and heuristic optimisation algorithms

With full genome data from several closely related species now readily available, we have the ultimate data for demographic inference. Exploiting these full genomes, however, requires models that can explicitly model recombination along alignments of full chromosomal length. Over the last decade a class of models, based on the sequential Markov coalescence model combined with hidden Markov models, has been developed and used to make inference in simple demographic scenarios. To move forward to more complex demographic modelling we need better and more automated ways of specifying these models and efficient optimisation algorithms for inferring the parameters in complex and often high-dimensional models. In this paper we present a framework for building such coalescence hidden Markov models for pairwise alignments and present results for using heuristic optimisation algorithms for parameter estimation. We show that we can build more complex demographic models than our previous frameworks and that we obtain more accurate parameter estimates using heuristic optimisation algorithms than when using our previous gradient based approaches. Our new framework provides a flexible way of constructing coalescence hidden Markov models almost automatically. While estimating parameters in more complex models is still challenging we show that using heuristic optimisation algorithms we still get a fairly good accuracy.

[1]  Churchill,et al.  A Markov Chain Model of Coalescence with Recombination , 1997, Theoretical population biology.

[2]  Joshua S. Paul,et al.  A sequentially Markov conditional sampling distribution for structured populations with migration and recombination. , 2012, Theoretical population biology.

[3]  Albert J. Vilella,et al.  Insights into hominid evolution from the gorilla genome sequence , 2012, Nature.

[4]  Russell C. Eberhart,et al.  A new optimizer using particle swarm theory , 1995, MHS'95. Proceedings of the Sixth International Symposium on Micro Machine and Human Science.

[5]  Thomas Mailund,et al.  zipHMMlib: a highly optimised HMM library exploiting repetitions in the input to speed up the forward algorithm , 2013, BMC Bioinformatics.

[6]  Kalyanmoy Deb,et al.  Multi-objective optimization using evolutionary algorithms , 2001, Wiley-Interscience series in systems and optimization.

[7]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[8]  Arcadi Navarro,et al.  Great ape genetic diversity and population history , 2013, Nature.

[9]  Sergey Koren,et al.  The bonobo genome compared with the chimpanzee and human genomes , 2012, Nature.

[10]  Asger Hobolth,et al.  Markovian approximation to the finite loci coalescent with recombination along multiple sequences. , 2014, Theoretical population biology.

[11]  T. Mailund,et al.  A fine-scale recombination map of the human–chimpanzee ancestor reveals faster change in humans than in chimpanzees and a strong impact of GC-biased gene conversion , 2014, Genome research.

[12]  G. McVean,et al.  Approximating the coalescent with recombination , 2005, Philosophical Transactions of the Royal Society B: Biological Sciences.

[13]  Cleve B. Moler,et al.  Nineteen Dubious Ways to Compute the Exponential of a Matrix, Twenty-Five Years Later , 1978, SIAM Rev..

[14]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[15]  R. Durbin,et al.  Inferring human population size and separation history from multiple genome sequences , 2014, Nature Genetics.

[16]  Goldberg,et al.  Genetic algorithms , 1993, Robust Control Systems with Genetic Algorithms.

[17]  Zbigniew Michalewicz,et al.  Genetic Algorithms + Data Structures = Evolution Programs , 1996, Springer Berlin Heidelberg.

[18]  A. Hobolth,et al.  Estimating Divergence Time and Ancestral Effective Population Size of Bornean and Sumatran Orangutan Subspecies Using a Coalescent Hidden Markov Model , 2011, PLoS genetics.

[19]  A. Hobolth,et al.  Ancestral Population Genomics: The Coalescent Hidden Markov Model Approach , 2009, Genetics.

[20]  Yun S. Song,et al.  Estimating Variable Effective Population Sizes from Multiple Genomes: A Sequentially Markov Conditional Sampling Distribution Approach , 2013, Genetics.

[21]  John A. Nelder,et al.  A Simplex Method for Function Minimization , 1965, Comput. J..

[22]  Dan Boneh,et al.  On genetic algorithms , 1995, COLT '95.

[23]  Gilbert Syswerda,et al.  Uniform Crossover in Genetic Algorithms , 1989, ICGA.

[24]  W. Miller,et al.  Polar and brown bear genomes reveal ancient admixture and demographic footprints of past climate change , 2012, Proceedings of the National Academy of Sciences.

[25]  Yue Shi,et al.  A modified particle swarm optimizer , 1998, 1998 IEEE International Conference on Evolutionary Computation Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98TH8360).

[26]  Anders E. Halager,et al.  A New Isolation with Migration Model along Complete Genomes Infers Very Different Divergence Processes among Closely Related Great Ape Species , 2012, PLoS genetics.

[27]  S. Jeffery Evolution of Protein Molecules , 1979 .

[28]  Paul Marjoram,et al.  Fast "coalescent" simulation , 2006, BMC Genetics.

[29]  J. David Schaffer,et al.  Proceedings of the third international conference on Genetic algorithms , 1989 .

[30]  R. Durbin,et al.  Inference of human population history from individual whole-genome sequences. , 2011, Nature.

[31]  A. Hobolth,et al.  Genomic Relationships and Speciation Times of Human, Chimpanzee, and Gorilla Inferred from a Coalescent Hidden Markov Model , 2006, PLoS genetics.

[32]  Gary K. Chen,et al.  Fast and flexible simulation of DNA sequence data. , 2008, Genome research.

[33]  James E. Baker,et al.  Reducing Bias and Inefficienry in the Selection Algorithm , 1987, ICGA.

[34]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[35]  M. Slatkin,et al.  The Concordance of Gene Trees and Species Trees at Two Linked Loci , 2006, Genetics.

[36]  David E. Goldberg,et al.  Genetic Algorithms, Tournament Selection, and the Effects of Noise , 1995, Complex Syst..

[37]  T. Jukes CHAPTER 24 – Evolution of Protein Molecules , 1969 .

[38]  Albert J. Vilella,et al.  Comparative and demographic analysis of orang-utan genomes , 2011, Nature.

[39]  Michael Westergaard,et al.  Using Colored Petri Nets to Construct Coalescent Hidden Markov Models: Automatic Translation from Demographic Specifications to Efficient Inference Methods , 2012, Petri Nets.

[40]  Carsten Wiuf,et al.  Gene Genealogies, Variation and Evolution - A Primer in Coalescent Theory , 2004 .