Algorithms for Computing the Family-Free Genomic Similarity Under DCJ

The genomic similarity is a large-scale measure for comparing two given genomes. In this work we study the (NP-hard) problem of computing the genomic similarity under the DCJ model in a setting that does not assume that the genes of the compared genomes are grouped into gene families. This problem is called family-free DCJ similarity. Here we propose an exact ILP algorithm to solve it, we show its APX-hardness, and we present three combinatorial heuristics, with computational experiments comparing their results to the ILP. Experiments on simulated datasets show that the proposed heuristics are very fast and even competitive with respect to the ILP algorithm for some instances.

[1]  Vineet Bafna,et al.  Genome rearrangements and sorting by reversals , 1993, Proceedings of 1993 IEEE 34th Annual Foundations of Computer Science.

[2]  Giorgio Gambosi,et al.  Complexity and Approximation , 1999, Springer Berlin Heidelberg.

[3]  Giorgio Gambosi,et al.  Complexity and approximation: combinatorial optimization problems and their approximability properties , 1999 .

[4]  David Sankoff,et al.  Edit Distances for Genome Comparisons Based on Non-Local Operations , 1992, CPM.

[5]  David Sankoff,et al.  Genome rearrangement with gene families , 1999, Bioinform..

[6]  Jens Stoye,et al.  Double Cut and Join with Insertions and Deletions , 2011, J. Comput. Biol..

[7]  Daniel Doerr,et al.  The Potential of Family-Free Genome Comparison , 2013, Models and Algorithms for Genome Evolution.

[8]  Guillaume Fertin,et al.  Efficient Tools for Computing the Number of Breakpoints and the Number of Adjacencies between Two Genomes with Duplicate Genes , 2008, J. Comput. Biol..

[9]  Daniel Doerr,et al.  Gene family assignment-free comparative genomics , 2012, BMC Bioinformatics.

[10]  Guillaume Fertin,et al.  A Pseudo-Boolean Framework for Computing Rearrangement Distances between Genomes with Duplicates , 2007, J. Comput. Biol..

[11]  S. Srinivasa Rao,et al.  A Simplified NP-Complete MAXSAT Problem , 1998, Inf. Process. Lett..

[12]  Yu Lin,et al.  Approximating the edit distance for genomes with duplicate genes under DCJ, insertion and deletion , 2012, BMC Bioinformatics.

[13]  Jens Stoye,et al.  On the family-free DCJ distance and similarity , 2015, Algorithms for Molecular Biology.

[14]  P. Berman,et al.  On Some Tighter Inapproximability Results , 1998, Electron. Colloquium Comput. Complex..

[15]  Jens Stoye,et al.  Approximating the DCJ distance of balanced genomes in linear time , 2017, Algorithms for Molecular Biology.

[16]  Richard Friedberg,et al.  Efficient sorting of genomic permutations by translocation, inversion and block interchange , 2005, Bioinform..

[17]  Johan Håstad,et al.  Some optimal inapproximability results , 2001, JACM.

[18]  Laurent Bulteau,et al.  Inapproximability of (1,2)-Exemplar Distance , 2013, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[19]  G. Gonnet,et al.  ALF—A Simulation Framework for Genome Evolution , 2011, Molecular biology and evolution.

[20]  Kenneth A. Hawick,et al.  Enumerating Circuits and Loops in Graphs with Self-Arcs and Multiple-Arcs , 2008, FCS.

[21]  Donald B. Johnson,et al.  Finding All the Elementary Circuits of a Directed Graph , 1975, SIAM J. Comput..

[22]  Jens Stoye,et al.  A Unifying View of Genome Rearrangements , 2006, WABI.

[23]  Guillaume Fertin,et al.  On the Approximability of Comparing Genomes with Duplicates , 2008, J. Graph Algorithms Appl..

[24]  Yu Lin,et al.  An Exact Algorithm to Compute the DCJ Distance for Genomes with Duplicate Genes , 2014, RECOMB.

[25]  Pavel A. Pevzner,et al.  Transforming men into mice (polynomial algorithm for genomic distance problem) , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[26]  Pierluigi Crescenzi,et al.  A short guide to approximation preserving reductions , 1997, Proceedings of Computational Complexity. Twelfth Annual IEEE Conference.

[27]  D. Bryant The Complexity of Calculating Exemplar Distances , 2000 .

[28]  Piotr Berman,et al.  A d/2 Approximation for Maximum Weight Independent Set in d-Claw Free Graphs , 2000, Nord. J. Comput..

[29]  Marek Karpinski,et al.  On Some Tighter Inapproximability Results (Extended Abstract) , 1999, ICALP.