Sampling and counting genome rearrangement scenarios

BackgroundEven for moderate size inputs, there are a tremendous number of optimal rearrangement scenarios, regardless what the model is and which specific question is to be answered. Therefore giving one optimal solution might be misleading and cannot be used for statistical inferring. Statistically well funded methods are necessary to sample uniformly from the solution space and then a small number of samples are sufficient for statistical inferring.ContributionIn this paper, we give a mini-review about the state-of-the-art of sampling and counting rearrangement scenarios, focusing on the reversal, DCJ and SCJ models. Above that, we also give a Gibbs sampler for sampling most parsimonious labeling of evolutionary trees under the SCJ model. The method has been implemented and tested on real life data. The software package together with example data can be downloaded from http://www.renyi.hu/~miklosi/SCJ-Gibbs/

[1]  Pavel A. Pevzner,et al.  Transforming cabbage into turnip: polynomial algorithm for sorting signed permutations by reversals , 1995, JACM.

[2]  Anne Bergeron,et al.  Combinatorial Structure of Genome Rearrangements Scenarios , 2010, J. Comput. Biol..

[3]  István Miklós,et al.  Counting and sampling SCJ small parsimony solutions , 2014, Theor. Comput. Sci..

[4]  István Miklós,et al.  On sampling SCJ rearrangement scenarios , 2013, ArXiv.

[5]  David Sankoff,et al.  Genome aliquoting with double cut and join , 2009, BMC Bioinformatics.

[6]  Alberto Caprara,et al.  Formulations and hardness of multiple sorting by reversals , 1999, RECOMB.

[7]  István Miklós,et al.  ParIS Genome Rearrangement server , 2005, Bioinform..

[8]  W. Fitch Toward Defining the Course of Evolution: Minimum Change for a Specific Tree Topology , 1971 .

[9]  Krister M. Swenson,et al.  The Metropolized Partial Importance Sampling MCMC Mixes Slowly on Minimum Reversal Rearrangement Paths , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[10]  Cédric Chauve,et al.  FPSAC: fast phylogenetic scaffolding of ancient contigs , 2013, Bioinform..

[11]  R. Durrett,et al.  Bayesian Estimation of Genomic Distance , 2004, Genetics.

[12]  Bret Larget,et al.  A bayesian analysis of metazoan mitochondrial genome arrangements. , 2005, Molecular biology and evolution.

[13]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  István Miklós MCMC genome rearrangement , 2003, ECCB.

[15]  István Miklós,et al.  Approximating the number of Double Cut-and-Join scenarios , 2012, Theor. Comput. Sci..

[16]  Tim Hesterberg,et al.  Monte Carlo Strategies in Scientific Computing , 2002, Technometrics.

[17]  João Meidanis,et al.  SCJ: A Breakpoint-Like Distance that Simplifies Several Rearrangement Problems , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[18]  Adam C. Siepel,et al.  An algorithm to enumerate all sorting reversals , 2002, RECOMB '02.

[19]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[20]  Jens Stoye,et al.  Counting All DCJ Sorting Scenarios , 2009, RECOMB-CG.

[21]  László A. Székely,et al.  On weighted multiway cuts in trees , 1994, Math. Program..

[22]  István Miklós,et al.  Bayesian sampling of genomic rearrangement scenarios via double cut and join , 2010, Bioinform..

[23]  Jens Stoye,et al.  A Unifying View of Genome Rearrangements , 2006, WABI.

[24]  Adam Siepel,et al.  An algorithm to find all sorting reversals , 2002 .

[25]  Leslie G. Valiant,et al.  The Complexity of Computing the Permanent , 1979, Theor. Comput. Sci..

[26]  Nadia El-Mabrouk,et al.  Exploring the Set of All Minimal Sequences of Reversals - An Application to Test the Replication-Directed Reversal Hypothesis , 2002, WABI.

[27]  L. Khachiyan,et al.  On the conductance of order Markov chains , 1991 .

[28]  Yu Lin,et al.  Sorting Signed Permutations by Inversions in O(nlogn) Time , 2009, RECOMB.

[29]  Leslie G. Valiant,et al.  Random Generation of Combinatorial Structures from a Uniform Distribution , 1986, Theor. Comput. Sci..

[30]  Cédric Chauve,et al.  A Methodological Framework for the Reconstruction of Contiguous Regions of Ancestral Genomes and Its Application to Mammalian Genomes , 2008, PLoS Comput. Biol..

[31]  Anne Bergeron,et al.  Advances on sorting by reversals , 2007, Discret. Appl. Math..

[32]  Donald Geman,et al.  Boundary Detection by Constrained Optimization , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[33]  Jens Stoye,et al.  On Computing the Breakpoint Reuse Rate in Rearrangement Scenarios , 2008, RECOMB-CG.

[34]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[35]  Pavel A Pevzner,et al.  Comparative genomics reveals birth and death of fragile regions in mammalian evolution , 2010, Genome Biology.

[36]  I. Miklós,et al.  Dynamics of Genome Rearrangement in Bacterial Populations , 2008, PLoS genetics.

[37]  Bimal Kumar Roy,et al.  Counting, sampling and integrating: Algorithms and complexity , 2013 .

[38]  G. Brightwell,et al.  Counting linear extensions , 1991 .

[39]  David Sankoff,et al.  Multichromosomal median and halving problems under different genomic distances , 2009, BMC Bioinformatics.

[40]  István Miklós,et al.  Efficient Sampling of Parsimonious Inversion Histories with Application to Genome Rearrangement in Yersinia , 2009, Genome biology and evolution.

[41]  David Sankoff,et al.  Guided genome halving: hardness, heuristics and the history of the Hemiascomycetes , 2008, ISMB.

[42]  P. Pevzner,et al.  Colored de Bruijn Graphs and the Genome Halving Problem , 2007, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[43]  David Sankoff,et al.  Locating the vertices of a steiner tree in an arbitrary metric space , 1975, Math. Program..