On the Family-Free DCJ Distance

Structural variation in genomes can be revealed by many (dis)similarity measures. Rearrangement operations, such as the so called double-cut-and-join (DCJ), are large-scale mutations that can create complex changes and produce such variations in genomes. A basic task in comparative genomics is to find the rearrangement distance between two given genomes, i.e., the minimum number of rearragement operations that transform one given genome into another one. In a family-based setting, genes are grouped into gene families and efficient algorithms were already proposed to compute the DCJ distance between two given genomes. In this work we propose the problem of computing the DCJ distance of two given genomes without prior gene family assignment, directly using the pairwise similarity between genes. We propose a new family-free DCJ distance, prove that the family-free DCJ distance problem is APX-hard, and provide an integer linear program to its solution.

[1]  David Sankoff,et al.  Genome rearrangement with gene families , 1999, Bioinform..

[2]  Rita Casadio,et al.  Algorithms in Bioinformatics, 5th International Workshop, WABI 2005, Mallorca, Spain, October 3-6, 2005, Proceedings , 2005, WABI.

[3]  João Meidanis,et al.  SCJ: A Breakpoint-Like Distance that Simplifies Several Rearrangement Problems , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[4]  Laurent Bulteau,et al.  Inapproximability of (1,2)-Exemplar Distance , 2013, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[5]  G. Gonnet,et al.  ALF—A Simulation Framework for Genome Evolution , 2011, Molecular biology and evolution.

[6]  Richard Friedberg,et al.  Efficient sorting of genomic permutations by translocation, inversion and block interchange , 2005, Bioinform..

[7]  Marie-France Sagot,et al.  The Solution Space of Sorting by Reversals , 2007, ISBRA.

[8]  Jens Stoye,et al.  The Solution Space of Sorting by DCJ , 2010, J. Comput. Biol..

[9]  Daniel Doerr,et al.  Orthology Detection Combining Clustering and Synteny for Very Large Datasets , 2014, PloS one.

[10]  Guillaume Fertin,et al.  On the Approximability of Comparing Genomes with Duplicates , 2008, J. Graph Algorithms Appl..

[11]  Yu Lin,et al.  An Exact Algorithm to Compute the DCJ Distance for Genomes with Duplicate Genes , 2014, RECOMB.

[12]  Pavel A. Pevzner,et al.  Transforming men into mice (polynomial algorithm for genomic distance problem) , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[13]  Daniel Doerr,et al.  The Potential of Family-Free Genome Comparison , 2013, Models and Algorithms for Genome Evolution.

[14]  Jens Stoye,et al.  A Unifying View of Genome Rearrangements , 2006, WABI.

[15]  Daniel Doerr,et al.  Gene family assignment-free comparative genomics , 2012, BMC Bioinformatics.

[16]  David Sankoff,et al.  Edit Distances for Genome Comparisons Based on Non-Local Operations , 1992, CPM.

[17]  D. Bryant The Complexity of Calculating Exemplar Distances , 2000 .

[18]  Vineet Bafna,et al.  Genome rearrangements and sorting by reversals , 1993, Proceedings of 1993 IEEE 34th Annual Foundations of Computer Science.

[19]  Giorgio Gambosi,et al.  Complexity and approximation: combinatorial optimization problems and their approximability properties , 1999 .