Genomes as documents of evolutionary history: a probabilistic macrosynteny model for the reconstruction of ancestral genomes

Motivation: It has been argued that whole‐genome duplication (WGD) exerted a profound influence on the course of evolution. For the purpose of fully understanding the impact of WGD, several formal algorithms have been developed for reconstructing pre‐WGD gene order in yeast and plant. However, to the best of our knowledge, those algorithms have never been successfully applied to WGD events in teleost and vertebrate, impeded by extensive gene shuffling and gene losses. Results: Here, we present a probabilistic model of macrosynteny (i.e. conserved linkage or chromosome‐scale distribution of orthologs), develop a variational Bayes algorithm for inferring the structure of pre‐WGD genomes, and study estimation accuracy by simulation. Then, by applying the method to the teleost WGD, we demonstrate effectiveness of the algorithm in a situation where gene‐order reconstruction algorithms perform relatively poorly due to a high rate of rearrangement and extensive gene losses. Our high‐resolution reconstruction reveals previously overlooked small‐scale rearrangements, necessitating a revision to previous views on genome structure evolution in teleost and vertebrate. Conclusions: We have reconstructed the structure of a pre‐WGD genome by employing a variational Bayes approach that was originally developed for inferring topics from millions of text documents. Interestingly, comparison of the macrosynteny and topic model algorithms suggests that macrosynteny can be regarded as documents on ancestral genome structure. From this perspective, the present study would seem to provide a textbook example of the prevalent metaphor that genomes are documents of evolutionary history. Availability and implementation: The analysis data are available for download at http://www.gen.tcd.ie/molevol/supp_data/MacrosyntenyTGD.zip, and the software written in Java is available upon request. Contact: yoichiro.nakatani@tcd.ie or aoife.mclysaght@tcd.ie Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  Cédric Chauve,et al.  Reconstructing the architecture of the ancestral amniote genome , 2011, Bioinform..

[2]  David Sankoff,et al.  The Reconstruction of Doubled Genomes , 2003, SIAM J. Comput..

[3]  Matthieu Muffato,et al.  Paleogenomics in vertebrates, or the recovery of lost genomes from the mist of time , 2008, BioEssays : news and reviews in molecular, cellular and developmental biology.

[4]  Peter W. H. Holland,et al.  MAJOR TRANSITIONS IN ANIMAL EVOLUTION : A DEVELOPMENTAL GENETIC PERSPECTIVE , 1998 .

[5]  I E Auger,et al.  Algorithms for the optimal identification of segment neighborhoods. , 1989, Bulletin of mathematical biology.

[6]  Kevin P. Byrne,et al.  Additions, Losses, and Rearrangements on the Evolutionary Route from a Reconstructed Ancestor to the Modern Saccharomyces cerevisiae Genome , 2009, PLoS genetics.

[7]  Mathieu Blanchette,et al.  A flexible ancestral genome reconstruction method based on gapped adjacencies , 2012, BMC Bioinformatics.

[8]  Huanming Yang,et al.  The Asian arowana (Scleropages formosus) genome provides new insights into the evolution of an early lineage of teleosts , 2016, Scientific Reports.

[9]  Takashi Makino,et al.  Ohnologs in the human genome are dosage balanced and frequently associated with disease , 2010, Proceedings of the National Academy of Sciences.

[10]  Fumiko Ohta,et al.  The medaka draft genome and insights into vertebrate genome evolution , 2007, Nature.

[11]  David Sankoff,et al.  Analysis of gene order evolution beyond single-copy genes. , 2012, Methods in molecular biology.

[12]  David Sankoff,et al.  Polyploids, genome halving and phylogeny , 2007, ISMB/ECCB.

[13]  David Sankoff,et al.  Practical aliquoting of flowering plant genomes , 2013, BMC Bioinformatics.

[14]  A. McLysaght,et al.  Genome-wide deserts for copy number variation in vertebrates , 2013, Nature Communications.

[15]  Nicholas H. Putnam,et al.  Sea Anemone Genome Reveals Ancestral Eumetazoan Gene Repertoire and Genomic Organization , 2007, Science.

[16]  David Sankoff,et al.  Genome Halving , 1998, CPM.

[17]  David Sankoff,et al.  Guided genome halving: hardness, heuristics and the history of the Hemiascomycetes , 2008, ISMB.

[18]  Bronwen L. Aken,et al.  The spotted gar genome illuminates vertebrate evolution and facilitates human-to-teleost comparisons , 2016, Nature Genetics.

[19]  A. McLysaght,et al.  Dosage sensitivity is a major determinant of human copy number variant pathogenicity , 2017, Nature Communications.

[20]  Frédéric Boyer,et al.  Prediction of Contiguous Regions in the Amniote Ancestral Genome , 2009, ISBRA.

[21]  Jun S. Liu,et al.  Bayesian inference on biopolymer models , 1999, Bioinform..

[22]  Cedric Chauve,et al.  Mapping ancestral genomes with massive gene loss: A matrix sandwich problem , 2011, Bioinform..

[23]  David Sankoff,et al.  A consolidation algorithm for genomes fractionated after higher order polyploidization , 2012, BMC Bioinformatics.

[24]  David M. Blei,et al.  Probabilistic topic models , 2012, Commun. ACM.

[25]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[26]  B. Boussau,et al.  Genomes as documents of evolutionary history. , 2010, Trends in ecology & evolution.

[27]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[28]  Charles E. Chapple,et al.  Genome duplication in the teleost fish Tetraodon nigroviridis reveals the early vertebrate proto-karyotype , 2004, Nature.

[29]  S. Ohno Evolution by Gene Duplication , 1971 .

[30]  E. Vassos,et al.  Ohnologs are overrepresented in pathogenic copy number mutations , 2013, Proceedings of the National Academy of Sciences.

[31]  Nicholas H. Putnam,et al.  The amphioxus genome and the evolution of the chordate karyotype , 2008, Nature.

[32]  A. Meyer,et al.  The evolutionary significance of ancient genome duplications , 2009, Nature Reviews Genetics.

[33]  David Sankoff,et al.  The dynamics of functional classes of plant genes in rediploidized ancient polyploids , 2013, BMC Bioinformatics.

[34]  Matthieu Muffato,et al.  Reconstruction de génomes ancestraux chez les vertébrés , 2010 .

[35]  Kevin P. Byrne,et al.  Multiple rounds of speciation associated with reciprocal gene loss in polyploid yeasts , 2006, Nature.

[36]  L. Pauling,et al.  Molecules as documents of evolutionary history. , 1965, Journal of theoretical biology.

[37]  Y. Kohara,et al.  Reconstruction of the vertebrate ancestral genome reveals dynamic genome reorganization in early vertebrates. , 2007, Genome research.