Efficient Tools for Computing the Number of Breakpoints and the Number of Adjacencies between Two Genomes with Duplicate Genes

Comparing genomes of different species is a fundamental problem in comparative genomics. Recent research has resulted in the introduction of different measures between pairs of genomes: for example, reversal distance, number of breakpoints, and number of common or conserved intervals. However, classical methods used for computing such measures are seriously compromised when genomes have several copies of the same gene scattered across them. Most approaches to overcome this difficulty are based either on the exemplar model, which keeps exactly one copy in each genome of each duplicated gene, or on the maximum matching model, which keeps as many copies as possible of each duplicated gene. The goal is to find an exemplar matching, respectively a maximum matching, that optimizes the studied measure. Unfortunately, it turns out that, in presence of duplications, this problem for each above-mentioned measure is NP-hard. In this paper, we propose to compute the minimum number of breakpoints and the maximum number of adjacencies between two genomes in presence of duplications using two different approaches. The first one is an exact, generic 0-1 linear programming approach, while the second is a collection of three heuristics. Each of these approaches is applied on each problem and for each of the following models: exemplar, maximum matching and intermediate model, that we introduce here. All these programs are run on a well-known public benchmark dataset of gamma-Proteobacteria, and their performances are discussed.

[1]  Cedric Chauve,et al.  Genes Order and Phylogenetic Reconstruction: Application to -Proteobacteria , 2005 .

[2]  Guillaume Fertin,et al.  A Pseudo-Boolean Framework for Computing Rearrangement Distances between Genomes with Duplicates , 2007, J. Comput. Biol..

[3]  Takeaki Uno,et al.  Fast Algorithms to Enumerate All Common Intervals of Two Permutations , 1997, Algorithmica.

[4]  Bin Fu,et al.  The Approximability of the Exemplar Breakpoint Distance Problem , 2006, AAIM.

[5]  Pavel A. Pevzner,et al.  Transforming cabbage into turnip: polynomial algorithm for sorting signed permutations by reversals , 1995, JACM.

[6]  Nadia El-Mabrouk,et al.  Maximizing Synteny Blocks to Identify Ancestral Homologs , 2005, Comparative Genomics.

[7]  József Beck,et al.  Lower bounds on the approximation of the multivariate empirical process , 1985 .

[8]  David K. Smith Theory of Linear and Integer Programming , 1987 .

[9]  D. Bryant The Complexity of Calculating Exemplar Distances , 2000 .

[10]  Guillaume Fertin,et al.  A General Framework for Computing Rearrangement Distances between Genomes with Duplicates , 2007 .

[11]  N. Moran,et al.  From Gene Trees to Organismal Phylogeny in Prokaryotes:The Case of the γ-Proteobacteria , 2003, PLoS biology.

[12]  Krister M. Swenson,et al.  Genomic Distances under Deletions and Insertions , 2004, Theor. Comput. Sci..

[13]  W. Ewens,et al.  The chromosome inversion problem , 1982 .

[14]  Guillaume Fertin,et al.  A Pseudo-boolean Programming Approach for Computing the Breakpoint Distance Between Two Genomes with Duplicate Genes , 2007, RECOMB-CG.

[15]  Bin Fu,et al.  Lower Bounds on the Approximation of the Exemplar Conserved Interval Distance Problem of Genomes , 2006, COCOON.

[16]  Jens Stoye,et al.  On the Similarity of Sets of Permutations and Its Applications to Genome Comparison , 2006, J. Comput. Biol..

[17]  Bin Fu,et al.  Non-breaking Similarity of Genomes with Gene Repetitions , 2007, CPM.

[18]  Guillaume Fertin,et al.  Comparing Genomes with Duplications: A Computational Complexity Point of View , 2007, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[19]  David Sankoff,et al.  Power Boosts for Cluster Tests , 2005, Comparative Genomics.

[20]  David Sankoff,et al.  Genome rearrangement with gene families , 1999, Bioinform..

[21]  G. Blin,et al.  The breakpoint distance for signed sequences , 2005 .

[22]  Xin Chen,et al.  Assignment of orthologous genes via genome rearrangement , 2005, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[23]  Romeo Rizzi,et al.  Conserved Interval Distance Computation Between Non-trivial Genomes , 2005, COCOON.

[24]  D. Sankoff,et al.  Comparative Genomics: "Empirical And Analytical Approaches To Gene Order Dynamics, Map Alignment And The Evolution Of Gene Families" , 2000 .

[25]  N. Moran,et al.  Evolutionary Origins of Genomic Repertoires in Bacteria , 2005, PLoS biology.