Finding Nested Common Intervals Efficiently

In this article, we study the problem of efficiently finding gene clusters formalized by nested common intervals between two genomes represented either as permutations or as sequences. Considering permutations, we give several algorithms whose running time depends on the size of the actual output rather than the output in the worst case. Indeed, we first provide a straightforward cubic time algorithm for finding all nested common intervals. We reduce this complexity by providing a quadratic time algorithm computing an irredundant output. We then show, by providing a third algorithm, that finding only the maximal nested common intervals can be done in linear time. Finally, we prove that finding approximate nested common intervals is fixed parameter tractable. Considering sequences, we provide solutions (modifications of previously defined algorithms and a new algorithm) for different variants of the problem, depending on the treatment one wants to apply to duplicated genes. This includes a polynomial-time algorithm for a variant implying a matching of the genes in the cluster, a setting that for other problems often leads to hardness.

[1]  Alexander Zelikovsky,et al.  Bioinformatics Algorithms: Techniques and Applications , 2008 .

[2]  Cedric Chauve,et al.  Formal Models of Gene Clusters , 2007 .

[3]  Mathieu Raffinot,et al.  Computing Common Intervals of K Permutations, with Applications to Modular Decomposition of Graphs , 2005, SIAM J. Discret. Math..

[4]  Kellogg S. Booth,et al.  Testing for the Consecutive Ones Property, Interval Graphs, and Graph Planarity Using PQ-Tree Algorithms , 1976, J. Comput. Syst. Sci..

[5]  Dannie Durand,et al.  The Incompatible Desiderata of Gene Cluster Properties , 2005, Comparative Genomics.

[6]  U Kurzik-Dumke,et al.  Identification of a novel Drosophila melanogaster gene, angel, a member of a nested gene cluster at locus 59F4,5. , 1996, Biochimica et biophysica acta.

[7]  Dr. Susumu Ohno Evolution by Gene Duplication , 1970, Springer Berlin Heidelberg.

[8]  Guillaume Fertin,et al.  Comparing Genomes with Duplications: A Computational Complexity Point of View , 2007, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[9]  Jens Stoye,et al.  On the Similarity of Sets of Permutations and Its Applications to Genome Comparison , 2003, COCOON.

[10]  Sven Rahmann,et al.  Integer Linear Programs for Discovering Approximate Gene Clusters , 2006, WABI.

[11]  Gad M. Landau,et al.  Using PQ Trees for Comparative Genomics , 2005, CPM.

[12]  Takeaki Uno,et al.  Fast Algorithms to Enumerate All Common Intervals of Two Permutations , 1997, Algorithmica.

[13]  Hon Wai Leong,et al.  Gene Team Tree: A Hierarchical Representation of Gene Teams for All Gap Lengths , 2009, J. Comput. Biol..

[14]  Mathieu Raffinot,et al.  The Algorithmic of Gene Teams , 2002, WABI.

[15]  Gad M. Landau,et al.  Gene Proximity Analysis across Whole Genomes via PQ Trees1 , 2005, J. Comput. Biol..

[16]  Jens Stoye,et al.  Character sets of strings , 2007, J. Discrete Algorithms.

[17]  Hon Wai Leong,et al.  Gene Team Tree: A Compact Representation of All Gene Teams , 2008, RECOMB-CG.

[18]  Guillaume Fertin,et al.  On the Approximability of Comparing Genomes with Duplicates , 2008, J. Graph Algorithms Appl..

[19]  Mathieu Raffinot,et al.  Computing Common Intervals of K Permutations, with Applications to Modular Decomposition of Graphs , 2005, ESA.

[20]  Jens Stoye,et al.  Quadratic Time Algorithms for Finding Common Intervals in Two and More Sequences , 2004, CPM.

[21]  Jens Stoye,et al.  Computation of Median Gene Clusters , 2008, RECOMB.

[22]  Xin He,et al.  Identifying Conserved Gene Clusters in the Presence of Homology Families , 2005, J. Comput. Biol..