A unifying model of genome evolution under parsimony

BackgroundParsimony and maximum likelihood methods of phylogenetic tree estimation and parsimony methods for genome rearrangements are central to the study of genome evolution yet to date they have largely been pursued in isolation.ResultsWe present a data structure called a history graph that offers a practical basis for the analysis of genome evolution. It conceptually simplifies the study of parsimonious evolutionary histories by representing both substitutions and double cut and join (DCJ) rearrangements in the presence of duplications. The problem of constructing parsimonious history graphs thus subsumes related maximum parsimony problems in the fields of phylogenetic reconstruction and genome rearrangement. We show that tractable functions can be used to define upper and lower bounds on the minimum number of substitutions and DCJ rearrangements needed to explain any history graph. These bounds become tight for a special type of unambiguous history graph called an ancestral variation graph (AVG), which constrains in its combinatorial structure the number of operations required. We finally demonstrate that for a given history graph G, a finite set of AVGs describe all parsimonious interpretations of G, and this set can be explored with a few sampling moves.ConclusionThis theoretical study describes a model in which the inference of genome rearrangements and phylogeny can be unified under parsimony.

[1]  Isaac Elias,et al.  Settling the Intractability of Multiple Alignment , 2003, ISAAC.

[2]  Pavel A. Pevzner,et al.  Transforming cabbage into turnip: polynomial algorithm for sorting signed permutations by reversals , 1995, JACM.

[3]  Benjamin J. Raphael,et al.  A novel method for multiple alignment of sequences with repeated and shuffled elements. , 2004, Genome research.

[4]  N. Perna,et al.  progressiveMauve: Multiple Genome Alignment with Gene Gain, Loss and Rearrangement , 2010, PloS one.

[5]  Jens Stoye,et al.  Double Cut and Join with Insertions and Deletions , 2011, J. Comput. Biol..

[6]  Mathieu Blanchette,et al.  On the Inference of Parsimonious Indel Evolutionary Scenarios , 2006, J. Bioinform. Comput. Biol..

[7]  Yun S. Song,et al.  Constructing Minimal Ancestral Recombination Graphs , 2005, J. Comput. Biol..

[8]  Jack Edmonds,et al.  Matching: A Well-Solved Class of Integer Linear Programs , 2001, Combinatorial Optimization.

[9]  Daniel J. Blankenberg,et al.  28-way vertebrate alignment and conservation track in the UCSC Genome Browser. , 2007, Genome research.

[10]  Kaizhong Zhang,et al.  Perfect phylogenetic networks with recombination , 2001, J. Comput. Biol..

[11]  Martin Bader,et al.  Genome rearrangements with duplications , 2010, BMC Bioinformatics.

[12]  David Haussler,et al.  The infinite sites model of genome evolution , 2008, Proceedings of the National Academy of Sciences.

[13]  D. Bienstock,et al.  Chapter 8 Algorithmic implications of the graph minor theorem , 1995 .

[14]  Richard Friedberg,et al.  Efficient sorting of genomic permutations by translocation, inversion and block interchange , 2005, Bioinform..

[15]  J. Mattick Genome research , 1990, Nature.

[16]  W. H. Day Computational complexity of inferring phylogenies from dissimilarity matrices. , 1987, Bulletin of mathematical biology.

[17]  Alberto Caprara,et al.  Formulations and hardness of multiple sorting by reversals , 1999, RECOMB.

[18]  Saurabh Sinha,et al.  Indelign: a probabilistic framework for annotation of insertions and deletions in a multiple alignment , 2007, Bioinform..

[19]  Pavel A. Pevzner,et al.  Multi-break rearrangements and chromosomal evolution , 2008, Theor. Comput. Sci..

[20]  David Haussler,et al.  Cactus Graphs for Genome Comparisons , 2010, RECOMB.

[21]  David Sankoff,et al.  Analysis of gene order evolution beyond single-copy genes. , 2012, Methods in molecular biology.

[22]  Richard Friedberg,et al.  DCJ Path Formulation for Genome Transformations which Include Insertions, Deletions, and Duplications , 2009, J. Comput. Biol..

[23]  D. Bienstock,et al.  Algorithmic Implications of the Graph Minor Theorem , 1995 .

[24]  E. Birney,et al.  Genome-wide nucleotide-level mammalian ancestor reconstruction. , 2008, Genome research.

[25]  Yu Lin,et al.  Approximating the edit distance for genomes with duplicate genes under DCJ, insertion and deletion , 2012, BMC Bioinformatics.

[26]  Paul Medvedev,et al.  Maximum Likelihood Genome Assembly , 2009, J. Comput. Biol..

[27]  Andrew Wei Xu,et al.  A Fast and Exact Algorithm for the Median of Three Problem: A Graph Decomposition Approach , 2009, J. Comput. Biol..

[28]  Jens Stoye,et al.  On Sorting by Translocations , 2005, RECOMB.

[29]  D. Haussler,et al.  Reconstructing large regions of an ancestral mammalian genome in silico. , 2004, Genome research.

[30]  Oscar Westesson,et al.  Accurate Detection of Recombinant Breakpoints in Whole-Genome Alignments , 2009, PLoS Comput. Biol..

[31]  David Sankoff,et al.  Multichromosomal median and halving problems under different genomic distances , 2009, BMC Bioinformatics.

[32]  Cedric Chauve,et al.  Models and Algorithms for Genome Evolution , 2013, Computational Biology.

[33]  David Haussler,et al.  Cactus: Algorithms for genome multiple sequence alignment. , 2011, Genome research.

[34]  P. Pevzner,et al.  Genome-scale evolution: reconstructing gene orders in the ancestral species. , 2002, Genome research.

[35]  Jens Stoye,et al.  A Unifying View of Genome Rearrangements , 2006, WABI.