The Similarity Distribution of Paralogous Gene Pairs Created by Recurrent Alternation of Polyploidization and Fractionation

We study modeling and inference problems around the process of fractionation, or the genome-wide process of losing one gene per duplicate pair following whole genome doubling (WGD), motivated by the evolution of plants over many tens of millions of years, with their repeated cycles of genome doubling and fractionation. We focus on the frequency distribution of similarities between the two genes, over all the duplicate pairs in the genome. Our model is fully general, accounting for repeated duplication, triplication or other k-tupling events (all subsumed under the term WGD), as well as a general fractionation rate in any time period among multiple progeny of a single gene. It also has a biologically and combinatorially well-motivated way of handling the tendency for at least one sibling to survive fractionation. We show how the method reduces to previously proposed models for special cases, and settles unresolved questions about the expected number of gene pairs tracing their ancestry back to each WGD event.

[1]  Daniel W. A. Buchan,et al.  The tomato genome sequence provides insights into fleshy fruit evolution , 2012, Nature.

[2]  David Sankoff,et al.  Evolutionary Model for the Statistical Divergence of Paralogous and Orthologous Gene Pairs Generated by Whole Genome Duplication and Speciation , 2018, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[3]  Haibao Tang,et al.  Finding and Comparing Syntenic Regions among Arabidopsis and the Outgroups Papaya, Poplar, and Grape: CoGe with Rosids1[W] , 2008, Plant Physiology.

[4]  M. Freeling,et al.  How to usefully compare homologous plant genes and chromosomes as DNA sequences. , 2008, The Plant journal : for cell and molecular biology.

[5]  David Peel,et al.  The EMMIX Algorithm for the Fitting of Normal and t-Components , 1999 .

[6]  D. Sankoff,et al.  Comparable rates of gene loss and functional divergence after genome duplications early in vertebrate evolution. , 1997, Genetics.

[7]  N. Eckardt,et al.  A sense of self: the role of DNA sequence elimination in allopolyploidization. , 2001, The Plant cell.

[8]  David Sankoff,et al.  Models for Similarity Distributions of Syntenic Homologs and Applications to Phylogenomics , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[9]  James C. Schnable,et al.  Fractionation mutagenesis and similar consequences of mechanisms removing dispensable or less-expressed DNA in plants. , 2012, Current opinion in plant biology.

[10]  Peter Adams,et al.  The EMMIX software for the fitting of mixtures of normal and t-components , 1999 .

[11]  Sudhir Kumar,et al.  Mutation rates in mammalian genomes , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Christophe Klopp,et al.  Reconstructing the genome of the most recent common ancestor of flowering plants , 2017, Nature Genetics.

[13]  S. Kuhara,et al.  Dissection of the Octoploid Strawberry Genome by Deep Sequencing of the Genomes of Fragaria Species , 2013, DNA research : an international journal for rapid publication of reports on genes and genomes.