The SCJ Small Parsimony Problem for Weighted Gene Adjacencies

Reconstructing ancestral gene orders in a given phylogeny is a classical problem in comparative genomics. Most existing methods compare conserved features in extant genomes in the phylogeny to define potential ancestral gene adjacencies, and either try to reconstruct all ancestral genomes under a global evolutionary parsimony criterion, or, focusing on a single ancestral genome, use a scaffolding approach to select a subset of ancestral gene adjacencies, generally aiming at reducing the fragmentation of the reconstructed ancestral genome. In this paper, we describe an exact algorithm for the Small Parsimony Problem that combines both approaches. We consider that gene adjacencies at internal nodes of the species phylogeny are weighted, and we introduce an objective function defined as a convex combination of these weights and the evolutionary cost under the Single-Cut-or-Join (SCJ) model. The weights of ancestral gene adjacencies can, e.g., be obtained through the recent availability of ancient DNA sequencing data, which provide a direct hint at the genome structure of the considered ancestor, or through probabilistic analysis of gene adjacencies evolution. We show the NP-hardness of our problem variant and propose a Fixed-Parameter Tractable algorithm based on the Sankoff-Rousseau dynamic programming algorithm that also allows to sample co-optimal solutions. We apply our approach to mammalian and bacterial data providing different degrees of complexity. We show that including adjacency weights in the objective has a significant impact in reducing the fragmentation of the reconstructed ancestral gene orders. An implementation is available at http://github.com/nluhmann/PhySca.

[1]  João Meidanis,et al.  Rearrangement-Based Phylogeny Using the Single-Cut-or-Join Operation , 2013, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[2]  Lars Arvestad,et al.  BESST - Efficient scaffolding of large fragmented assemblies , 2014, BMC Bioinformatics.

[3]  Miklós Csürös,et al.  How to Infer Ancestral Genome Features by Parsimony: Dynamic Programming over an Evolutionary Tree , 2013, Models and Algorithms for Genome Evolution.

[4]  Max A. Alekseyev,et al.  Scaffold assembly based on genome rearrangement analysis , 2015, Comput. Biol. Chem..

[5]  David Sankoff,et al.  Multichromosomal median and halving problems under different genomic distances , 2009, BMC Bioinformatics.

[6]  Gergely J. Szöllosi,et al.  Evolution of gene neighborhoods within reconciled phylogenies , 2012, Bioinform..

[7]  David Sankoff,et al.  Locating the vertices of a steiner tree in an arbitrary metric space , 1975, Math. Program..

[8]  Cédric Chauve,et al.  A Methodological Framework for the Reconstruction of Contiguous Regions of Ancestral Genomes and Its Application to Mammalian Genomes , 2008, PLoS Comput. Biol..

[9]  Ján Manuch,et al.  Linearization of ancestral multichromosomal genomes , 2012, BMC Bioinformatics.

[10]  Igor Mandric,et al.  ScaffMatch: Scaffolding Algorithm Based on Maximum Weight Matching , 2015, RECOMB.

[11]  Shuai Jiang,et al.  Reconstruction of ancestral genomes in presence of gene gain and loss , 2016, bioRxiv.

[12]  Tomás Vinar,et al.  A Practical Algorithm for Ancestral Rearrangement Reconstruction , 2011, WABI.

[13]  István Miklós,et al.  Sampling and counting genome rearrangement scenarios , 2015, BMC Bioinformatics.

[14]  Tomás Vinar,et al.  GAML: genome assembly by maximum likelihood , 2014, Algorithms for Molecular Biology.

[15]  Pietro Liò,et al.  MeDuSa: a multi-draft based scaffolder , 2015, Bioinform..

[16]  Jayarama,et al.  The coffee genome provides insight into the convergent evolution of caffeine biosynthesis , 2014, Science.

[17]  Cédric Chauve,et al.  FPSAC: fast phylogenetic scaffolding of ancient contigs , 2013, Bioinform..

[18]  Miklós Csűrös,et al.  How to Infer Ancestral Genome Features by Parsimony: Dynamic Programming over an Evolutionary Tree , 2013 .

[19]  Jens Stoye,et al.  A Unified Approach for Reconstructing Ancient Gene Clusters , 2009, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[20]  Bernard M. E. Moret,et al.  GASTS: Parsimony Scoring under Rearrangements , 2011, WABI.

[21]  Roland Wittler Phylogeny-based analysis of gene clusters , 2010 .

[22]  Cédric Chauve,et al.  ANGES: reconstructing ANcestral GEnomeS maps , 2012, Bioinform..

[23]  Mathieu Blanchette,et al.  Reconstruction of Ancestral Genome Subject to Whole Genome Duplication, Speciation, Rearrangement and Loss , 2010, WABI.

[24]  Pedro Feijão,et al.  Reconstruction of ancestral gene orders using intermediate genomes , 2015, BMC Bioinformatics.

[25]  Cedric Chauve,et al.  Evolution of genes neighborhood within reconciled phylogenies: an ensemble approach , 2015, bioRxiv.

[26]  João Meidanis,et al.  SCJ: A Breakpoint-Like Distance that Simplifies Several Rearrangement Problems , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[27]  Bernard B. Suh,et al.  Reconstructing contiguous regions of an ancestral genome. , 2006, Genome research.

[28]  P. Pevzner,et al.  Breakpoint graphs and ancestral genome reconstructions. , 2009, Genome research.

[29]  Annie Chateau,et al.  Ancestral gene synteny reconstruction improves extant species scaffolding , 2015, bioRxiv.

[30]  David Sankoff,et al.  The pineapple genome and the evolution of CAM photosynthesis , 2015, Nature Genetics.

[31]  Eloi Araujo,et al.  Fast ancestral gene order reconstruction of genomes with unequal gene content , 2016, BMC Bioinformatics.

[32]  David Sankoff,et al.  On the PATHGROUPS approach to rapid small phylogeny , 2011, BMC Bioinformatics.

[33]  P. Berman,et al.  On Some Tighter Inapproximability Results , 1998, Electron. Colloquium Comput. Complex..

[34]  Cédric Chauve,et al.  The SCJ Small Parsimony Problem for Weighted Gene Adjacencies , 2016, ISBRA.

[35]  W. Fitch Toward Defining the Course of Evolution: Minimum Change for a Specific Tree Topology , 1971 .

[36]  Matthias Meyer,et al.  A draft genome of Yersinia pestis from victims of the Black Death , 2011, Nature.

[37]  Jens Stoye,et al.  Scaffolding of Ancient Contigs and Ancestral Reconstruction in a Phylogenetic Framework , 2014, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[38]  J. Hartigan MINIMUM MUTATION FITS TO A GIVEN TREE , 1973 .

[39]  James E. Allen,et al.  Highly evolvable malaria vectors: The genomes of 16 Anopheles mosquitoes , 2014, Science.