A flexible ancestral genome reconstruction method based on gapped adjacencies

BackgroundThe "small phylogeny" problem consists in inferring ancestral genomes associated with each internal node of a phylogenetic tree of a set of extant species. Existing methods can be grouped into two main categories: the distance-based methods aiming at minimizing a total branch length, and the synteny-based (or mapping) methods that first predict a collection of relations between ancestral markers in term of "synteny", and then assemble this collection into a set of Contiguous Ancestral Regions (CARs). The predicted CARs are likely to be more reliable as they are more directly deduced from observed conservations in extant species. However the challenge is to end up with a completely assembled genome.ResultsWe develop a new synteny-based method that is flexible enough to handle a model of evolution involving whole genome duplication events, in addition to rearrangements, gene insertions, and losses. Ancestral relationships between markers are defined in term of Gapped Adjacencies, i.e. pairs of markers separated by up to a given number of markers. It improves on a previous restricted to direct adjacencies, which revealed a high accuracy for adjacency prediction, but with the drawback of being overly conservative, i.e. of generating a large number of CARs. Applying our algorithm on various simulated data sets reveals good performance as we usually end up with a completely assembled genome, while keeping a low error rate.AvailabilityAll source code is available at http://www.iro.umontreal.ca/~mabrouk.

[1]  Ron Shamir,et al.  The median problems for breakpoints are NP-complete , 1998, Electron. Colloquium Comput. Complex..

[2]  David Sankoff,et al.  The Reconstruction of Doubled Genomes , 2003, SIAM J. Comput..

[3]  Kevin P. Byrne,et al.  The Yeast Gene Order Browser: combining curated homology and syntenic context reveals gene fate in polyploid species. , 2005, Genome research.

[4]  D. Haussler,et al.  Evolution's cauldron: Duplication, deletion, and rearrangement in the mouse and human genomes , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[5]  David Sankoff,et al.  Multiple Genome Rearrangement and Breakpoint Phylogeny , 1998, J. Comput. Biol..

[6]  Cedric Chauve,et al.  Formal Models of Gene Clusters , 2007 .

[7]  Haris Gavranovic,et al.  Guided Genome Halving: Provably Optimal Solutions Provide Good Insights into the Preduplication Ancestral Genome of Saccharomyces Cerevisiae , 2010, Pacific Symposium on Biocomputing.

[8]  Sai Guna Ranjan Gurazada,et al.  Genome sequencing and analysis of the model grass Brachypodium distachyon , 2010, Nature.

[9]  Kevin P. Byrne,et al.  Additions, Losses, and Rearrangements on the Evolutionary Route from a Reconstructed Ancestor to the Modern Saccharomyces cerevisiae Genome , 2009, PLoS genetics.

[10]  Cédric Chauve,et al.  Reconstructing the architecture of the ancestral amniote genome , 2011, Bioinform..

[11]  Matthieu Muffato,et al.  Genomicus: a database and a browser to study gene synteny in modern and ancestral genomes , 2010, Bioinform..

[12]  Mathieu Blanchette,et al.  Reconstruction of Ancestral Genome Subject to Whole Genome Duplication, Speciation, Rearrangement and Loss , 2010, WABI.

[13]  Mihaela M. Martis,et al.  The Sorghum bicolor genome and the diversification of grasses , 2009, Nature.

[14]  BMC Bioinformatics , 2005 .

[15]  Cedric Chauve,et al.  Mapping ancestral genomes with massive gene loss: A matrix sandwich problem , 2011, Bioinform..

[16]  Cédric Chauve,et al.  A Methodological Framework for the Reconstruction of Contiguous Regions of Ancestral Genomes and Its Application to Mammalian Genomes , 2008, PLoS Comput. Biol..

[17]  Brian W. Kernighan,et al.  An Effective Heuristic Algorithm for the Traveling-Salesman Problem , 1973, Oper. Res..

[18]  David Sankoff,et al.  Analysis of gene order evolution beyond single-copy genes. , 2012, Methods in molecular biology.

[19]  D. R. Fulkerson,et al.  Incidence matrices and interval graphs , 1965 .

[20]  David Bryant,et al.  A lower bound for the breakpoint phylogeny problem , 2000, J. Discrete Algorithms.

[21]  David Sankoff,et al.  Guided genome halving: hardness, heuristics and the history of the Hemiascomycetes , 2008, ISMB.

[22]  Joachim Messing,et al.  Ancestral grass karyotype reconstruction unravels new mechanisms of genome shuffling as a source of plant evolution. , 2010, Genome research.

[23]  John A. Hamilton,et al.  The TIGR Rice Genome Annotation Resource: improvements and new features , 2006, Nucleic Acids Res..

[24]  Bernard B. Suh,et al.  Reconstructing contiguous regions of an ancestral genome. , 2006, Genome research.

[25]  D. Hillis,et al.  Resolution of phylogenetic conflict in large data sets by increased taxon sampling. , 2006, Systematic biology.

[26]  David Sankoff,et al.  Descendants of Whole Genome Duplication within Gene Order Phylogeny , 2008, J. Comput. Biol..

[27]  Tandy J. Warnow,et al.  New approaches for reconstructing phylogenies from gene order data , 2001, ISMB.

[28]  P. Pevzner,et al.  Genome-scale evolution: reconstructing gene orders in the ancestral species. , 2002, Genome research.

[29]  Amit U. Sinha,et al.  Cinteny: flexible analysis and visualization of synteny and genome rearrangements in multiple organisms , 2007, BMC Bioinformatics.

[30]  Annie Chateau,et al.  Reconstructing Ancestral Gene Orders Using Conserved Intervals , 2004, WABI.

[31]  David Sankoff,et al.  On the PATHGROUPS approach to rapid small phylogeny , 2011, BMC Bioinformatics.