Reconstruction of ancestral genomes in presence of gene gain and loss

Since most dramatic genomic changes are caused by genome rearrangements as well as gene duplications and gain/loss events, it becomes crucial to understand their mechanisms and reconstruct ancestral genomes of the given genomes. This problem was shown to be NP-complete even in the "simplest" case of three genomes, thus calling for heuristic rather than exact algorithmic solutions. At the same time, a larger number of input genomes may actually simplify the problem in practice as it was earlier illustrated with MGRA, a state-of-the-art software tool for reconstruction of ancestral genomes of multiple genomes. One of the key obstacles for MGRA and other similar tools is presence of breakpoint reuses when the same breakpoint region is broken by several different genome rearrangements in the course of evolution. Furthermore, such tools are often limited to genomes composed of the same genes with each gene present in a single copy in every genome. This limitation makes these tools inapplicable for many biological datasets and degrades the resolution of ancestral reconstructions in diverse datasets. We address these deficiencies by extending the MGRA algorithm to genomes with unequal gene contents. The developed next-generation tool MGRA2 can handle gene gain/loss events and shares the ability of MGRA to reconstruct ancestral genomes uniquely in the case of limited breakpoint reuse. Furthermore, MGRA2 employs a number of novel heuristics to cope with higher breakpoint reuse and process datasets inaccessible for MGRA. In practical experiments, MGRA2 shows superior performance for simulated and real genomes as compared to other ancestral genome reconstruction tools.

[1]  Jun Zhou,et al.  Probabilistic Reconstruction of Ancestral Gene Orders with Insertions and Deletions , 2014, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[2]  Yu Lin,et al.  Maximum Likelihood Phylogenetic Reconstruction from High-Resolution Whole-Genome Data and a Tree of 68 Eukaryotes , 2012, Pacific Symposium on Biocomputing.

[3]  Vladimir Kolmogorov,et al.  Blossom V: a new implementation of a minimum cost perfect matching algorithm , 2009, Math. Program. Comput..

[4]  Phillip E. C. Compeau DCJ-Indel sorting revisited , 2012, Algorithms for Molecular Biology.

[5]  Hao Zhao,et al.  Recovering True Rearrangement Events on Phylogenetic Trees , 2007, RECOMB-CG.

[6]  Pavel A. Pevzner,et al.  Decoding the Genomic Architecture of Mammalian and Plant Genomes: Synteny Blocks and Large-scale Duplications , 2010, Commun. Inf. Syst..

[7]  Annie Chateau,et al.  Computation of Perfect DCJ Rearrangement Scenarios with Linear and Circular Chromosomes , 2009, J. Comput. Biol..

[8]  Bernard B. Suh,et al.  Reconstructing contiguous regions of an ancestral genome. , 2006, Genome research.

[9]  David A. Bader,et al.  A detailed study of breakpoint analysis , 2001 .

[10]  Pavel A. Pevzner,et al.  Multi-break rearrangements and chromosomal evolution , 2008, Theor. Comput. Sci..

[11]  Yu Lin,et al.  TIBA: a tool for phylogeny inference from rearrangement data with bootstrap analysis , 2012, Bioinform..

[12]  Jens Stoye,et al.  Double Cut and Join with Insertions and Deletions , 2011, J. Comput. Biol..

[13]  Richard Friedberg,et al.  Efficient sorting of genomic permutations by translocation, inversion and block interchange , 2005, Bioinform..

[14]  Mathieu Blanchette,et al.  A flexible ancestral genome reconstruction method based on gapped adjacencies , 2012, BMC Bioinformatics.

[15]  Martin Bader,et al.  Sorting by reversals, block interchanges, tandem duplications, and deletions , 2009, BMC Bioinformatics.

[16]  Jian Ma A probabilistic framework for inferring ancestral genomic orders , 2010, 2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[17]  William Arndt,et al.  Emulating Insertion and Deletion Events in Genome Rearrangement Analysis , 2011, 2011 IEEE International Conference on Bioinformatics and Biomedicine.

[18]  Pavel A Pevzner,et al.  Comparative genomics reveals birth and death of fragile regions in mammalian evolution , 2010, Genome Biology.

[19]  P. Pevzner,et al.  Genome-scale evolution: reconstructing gene orders in the ancestral species. , 2002, Genome research.

[20]  Shuigeng Zhou,et al.  A comparison study on feature selection of DNA structural properties for promoter prediction , 2012, BMC Bioinformatics.

[21]  Jens Stoye,et al.  On the inversion-indel distance , 2013, BMC Bioinformatics.

[22]  P. Pevzner,et al.  Breakpoint graphs and ancestral genome reconstructions. , 2009, Genome research.

[23]  Richard Friedberg,et al.  Sorting Genomes with Insertions, Deletions and Duplications by DCJ , 2008, RECOMB-CG.

[24]  Max A. Alekseyev,et al.  Multi-Break Rearrangements and Breakpoint Re-Uses: From Circular to Linear Genomes , 2008, J. Comput. Biol..

[25]  Jijun Tang,et al.  Reconstructing Ancestral Genomic Orders Using Binary Encoding and Probabilistic Models , 2013, ISBRA.

[26]  Arek Kasprzyk,et al.  BioMart: driving a paradigm change in biological data management , 2011, Database J. Biol. Databases Curation.