Output-Sensitive Algorithms for Finding the Nested Common Intervals of Two General Sequences

The focus of this paper is the problem of finding all nested common intervals of two general sequences. Depending on the treatment one wants to apply to duplicate genes, Blin et al. introduced three models to define nested common intervals of two sequences: the uniqueness, the free-inclusion, and the bijection models. We consider all the three models. For the uniqueness and the bijection models, we give O(n + N<sub>out</sub>)-time algorithms, where N<sub>out</sub> denotes the size of the output. For the free-inclusion model, we give an O(n<sup>1+ε</sup> + N<sub>out</sub>)-time algorithm, where ε >; 0 is an arbitrarily small constant. We also present an upper bound on the size of the output for each model. For the uniqueness and the free-inclusion models, we show that N<sub>out</sub> = O(n<sup>2</sup>). Let C = Σ<sub>gϵΓ</sub> o<sub>1</sub>(g)o<sub>2</sub>(5), where Γ is the set of distinct genes, and o<sub>1</sub>(g) and o<sub>2</sub>(g) are, respectively, the numbers of copies of g in the two given sequences. For the bijection model, we show that N<sub>out</sub> = O(Cn). In this paper, we also study the problem of finding all approximate nested common intervals of two sequences on the bijection model. An O(δn + N<sub>out</sub>)-time algorithm is presented, where δ denotes the maximum number of allowed gaps. In addition, we show that for this problem N<sub>out</sub> is O(δn<sup>3</sup>).

[1]  Jens Stoye,et al.  Character sets of strings , 2007, J. Discrete Algorithms.

[2]  Takeaki Uno,et al.  Fast Algorithms to Enumerate All Common Intervals of Two Permutations , 1997, Algorithmica.

[3]  Jens Stoye,et al.  On the Similarity of Sets of Permutations and Its Applications to Genome Comparison , 2003, COCOON.

[4]  Sven Rahmann,et al.  Integer Linear Programs for Discovering Approximate Gene Clusters , 2006, WABI.

[5]  Franco P. Preparata,et al.  Sequencing-by-hybridization revisited: the analog-spectrum proposal , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[6]  Biing-Feng Wang,et al.  A New Efficient Algorithm for the Gene-Team Problem on General Sequences , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[7]  Jens Stoye,et al.  Finding Nested Common Intervals Efficiently , 2010, J. Comput. Biol..

[8]  Jens Stoye,et al.  Computation of Median Gene Clusters , 2008, RECOMB.

[9]  Xin He,et al.  Identifying Conserved Gene Clusters in the Presence of Homology Families , 2005, J. Comput. Biol..

[10]  R. Overbeek,et al.  The use of gene clusters to infer functional coupling. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Clifford Stein,et al.  Introduction to Algorithms, 2nd edition. , 2001 .

[12]  Mathieu Raffinot,et al.  An algorithmic view of gene teams , 2004, Theor. Comput. Sci..

[13]  U Kurzik-Dumke,et al.  Identification of a novel Drosophila melanogaster gene, angel, a member of a nested gene cluster at locus 59F4,5. , 1996, Biochimica et biophysica acta.

[14]  Katharina Jahn Efficient Computation of Approximate Gene Clusters Based on Reference Occurrences , 2011, J. Comput. Biol..

[15]  B. Snel,et al.  The identification of functional modules from the genomic association of genes , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[16]  B. Snel,et al.  Conservation of gene order: a fingerprint of proteins that physically interact. , 1998, Trends in biochemical sciences.

[17]  S. Salzberg,et al.  Prediction of operons in microbial genomes. , 2001, Nucleic acids research.

[18]  Cedric Chauve,et al.  Formal Models of Gene Clusters , 2007 .

[19]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[20]  Mathieu Raffinot,et al.  Gene teams: a new formalization of gene clusters for comparative genomics , 2003, Comput. Biol. Chem..

[21]  Jens Stoye,et al.  Quadratic Time Algorithms for Finding Common Intervals in Two and More Sequences , 2004, CPM.

[22]  Dannie Durand,et al.  The Incompatible Desiderata of Gene Cluster Properties , 2005, Comparative Genomics.

[23]  J. Lawrence,et al.  Selfish operons: the evolutionary impact of gene clustering in prokaryotes and eukaryotes. , 1999, Current opinion in genetics & development.

[24]  P Bork,et al.  Gene context conservation of a higher order than operons. , 2000, Trends in biochemical sciences.

[25]  Alexander Zelikovsky,et al.  Bioinformatics Algorithms: Techniques and Applications , 2008 .

[26]  Hermann A. Maurer,et al.  Efficient worst-case data structures for range searching , 1978, Acta Informatica.

[27]  Jens Stoye,et al.  Finding All Common Intervals of k Permutations , 2001, CPM.

[28]  Jens Stoye,et al.  Computation of Median Gene Clusters , 2009, J. Comput. Biol..

[29]  Gilles Didier,et al.  Common Intervals of Two Sequences , 2003, WABI.

[30]  Biing-Feng Wang,et al.  Improved Algorithms for Finding Gene Teams and Constructing Gene Team Trees , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[31]  Michael Ian Shamos,et al.  Computational geometry: an introduction , 1985 .