A unified ILP framework for core ancestral genome reconstruction problems

MOTIVATION One of the key computational problems in comparative genomics is the reconstruction of genomes of ancestral species based on genomes of extant species. Since most dramatic changes in genomic architectures are caused by genome rearrangements, this problem is often posed as minimization of the number of genome rearrangements between extant and ancestral genomes. The basic case of three given genomes is known as the genome median problem. Whole genome duplications (WGDs) represent yet another type of dramatic evolutionary events and inspire the reconstruction of pre-duplicated ancestral genomes, referred to as the genome halving problem. Generalization of WGDs to whole genome multiplication events leads to the genome aliquoting problem. RESULTS In the present study, we propose polynomial-size integer linear programming (ILP) formulations for the aforementioned problems. We further obtain such formulations for the restricted and conserved versions of the median and halving problems, which have been recently introduced to improve biological relevance of the solutions. Extensive evaluation of solutions to the different ILP problems demonstrate their good accuracy. Furthermore, since the ILP formulations for the conserved versions have linear size, they provide a novel practical approach to ancestral genome reconstruction, which combines the advantages of homology- and rearrangements-based methods. AVAILABILITY Code and data are available in https://github.com/AvdeevPavel/ILP-WGD-reconstructor.

[1]  Jijun Tang,et al.  Ancestral Genome Inference Using a Genetic Algorithm Approach , 2013, PloS one.

[2]  Carole Knibbe,et al.  Breaking Good: Accounting for Fragility of Genomic Regions in Rearrangement Distance Estimation , 2016, Genome biology and evolution.

[3]  Yu Lin,et al.  Estimating true evolutionary distances under the DCJ model , 2008, ISMB.

[4]  Roded Sharan,et al.  Genome Rearrangement with ILP , 2018, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[5]  Shuai Jiang,et al.  Reconstruction of ancestral genomes in presence of gene gain and loss , 2016, bioRxiv.

[6]  David Sankoff,et al.  Medians seek the corners, and other conjectures , 2012, BMC Bioinformatics.

[7]  Kevin P. Byrne,et al.  The Yeast Gene Order Browser: combining curated homology and syntenic context reveals gene fate in polyploid species. , 2005, Genome research.

[8]  P. Pevzner,et al.  Breakpoint graphs and ancestral genome reconstructions. , 2009, Genome research.

[9]  Pavel A. Pevzner,et al.  Multi-break rearrangements and chromosomal evolution , 2008, Theor. Comput. Sci..

[10]  Nikita Alexeev,et al.  Estimation of the true evolutionary distance under the fragile breakage model , 2015, 2015 IEEE 5th International Conference on Computational Advances in Bio and Medical Sciences (ICCABS).

[11]  Alberto Caprara,et al.  A column-generation based branch-and-bound algorithm for sorting by reversals , 1998, Mathematical Support for Molecular Biology.

[12]  David Sankoff,et al.  Multichromosomal median and halving problems under different genomic distances , 2009, BMC Bioinformatics.

[13]  Denis Bertrand,et al.  Genome Halving and Double Distance with Losses , 2011, J. Comput. Biol..

[14]  Nikita Alexeev,et al.  Comparative genomics meets topology: a novel view on genome median and halving problems , 2016, BMC Bioinformatics.

[15]  Shuai Jiang,et al.  Linearization of Median Genomes Under the Double-Cut-and-Join-Indel Model , 2019, Evolutionary bioinformatics online.

[16]  David Sankoff,et al.  Guided genome halving: hardness, heuristics and the history of the Hemiascomycetes , 2008, ISMB.

[17]  David Sankoff,et al.  Genome Halving with Double Cut and Join , 2009, APBC.

[18]  Mikael Bodén,et al.  Computing the Reversal Distance between genomes in the Presence of Multi-gene Families via Binary Integer Programming , 2007, J. Bioinform. Comput. Biol..

[19]  Margaret R. Thomson,et al.  Vertebrate genome evolution and the zebrafish gene map , 1998, Nature Genetics.

[20]  P. Pevzner,et al.  Colored de Bruijn Graphs and the Genome Halving Problem , 2007, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[21]  Ehud Shapiro,et al.  eSTGt: a programming and simulation environment for population dynamics , 2016, BMC Bioinformatics.

[22]  Jun Zhou,et al.  A Median Solver and Phylogenetic Inference Based on Double-Cut-and-Join Sorting , 2017, J. Comput. Biol..

[23]  David Sankoff,et al.  Evolutionary Model for the Statistical Divergence of Paralogous and Orthologous Gene Pairs Generated by Whole Genome Duplication and Speciation , 2018, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[24]  Andrew Wei Xu,et al.  A Fast and Exact Algorithm for the Median of Three Problem: A Graph Decomposition Approach , 2009, J. Comput. Biol..

[25]  Yu Lin,et al.  Heuristics for the inversion median problem , 2010, BMC Bioinformatics.

[26]  Alberto Caprara The Reversal Median Problem , 2003, INFORMS J. Comput..

[27]  Ron Shamir,et al.  Sorting cancer karyotypes using double-cut-and-joins, duplications and deletions , 2018, Bioinform..

[28]  David Sankoff,et al.  Genome aliquoting with double cut and join , 2009, BMC Bioinformatics.

[29]  Beat Keller,et al.  Ancestral genome duplication in rice. , 2004, Genome.

[30]  O. Lund,et al.  Bacterial whole genome-based phylogeny: construction of a new benchmarking dataset and assessment of some existing methods , 2017, BMC Genomics.

[31]  P. Pevzner,et al.  Genome rearrangements in mammalian evolution: lessons from human and mouse genomes. , 2003, Genome research.

[32]  Eloi Araujo,et al.  Fast ancestral gene order reconstruction of genomes with unequal gene content , 2016, BMC Bioinformatics.

[33]  Pavel A. Pevzner,et al.  Transforming Cabbage into Turnip: Polynomial Algorithm for Sorting Signed Permutations by Reversals , 1999, J. ACM.

[34]  Bernard M. E. Moret,et al.  Comparing genomes with rearrangements and segmental duplications , 2015, Bioinform..

[35]  Bernard M. E. Moret,et al.  An Exact Algorithm to Compute the Double-Cut-and-Join Distance for Genomes with Duplicate Genes , 2015, J. Comput. Biol..

[36]  David Sankoff,et al.  Genome Halving with an Outgroup , 2006, Evolutionary bioinformatics online.

[37]  B. Birren,et al.  Proof and evolutionary analysis of ancient genome duplication in the yeast Saccharomyces cerevisiae , 2004, Nature.

[38]  Richard Friedberg,et al.  Efficient sorting of genomic permutations by translocation, inversion and block interchange , 2005, Bioinform..

[39]  Pedro Feijão,et al.  Reconstruction of ancestral gene orders using intermediate genomes , 2015, BMC Bioinformatics.

[40]  Sugunadevi Sakkiah,et al.  Insight the C-Site Pocket Conformational Changes Responsible for Sirtuin 2 Activity Using Molecular Dynamics Simulations , 2013, PloS one.

[41]  David Sankoff,et al.  The Reconstruction of Doubled Genomes , 2003, SIAM J. Comput..