Determining DNA Sequence Similarity Using Maximum Independent Set Algorithms for Interval Graphs

Motivated by the problem of finding similarities in DNA and amino acid sequences, we study a particular class of two dimensional interval graphs and present an algorithm that finds a maximum weight “increasing” independent set for this class. Our class of interval graphs is a subclass of the graphs with interval number 2. The algorithm we present runs in O(n log n) time, where n is the number of nodes, and its implementation provides a practical solution to a common problem in genetic sequence comparison.

[1]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[2]  Alfred V. Aho,et al.  The Design and Analysis of Computer Algorithms , 1974 .

[3]  H. M. Martinez,et al.  An efficient method for finding repeats in molecular sequences , 1983, Nucleic Acids Res..

[4]  O. Gotoh,et al.  Optimal sequence alignment allowing for long gaps. , 1990, Bulletin of mathematical biology.

[5]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[6]  M. Golumbic CHAPTER 3 – Perfect Graphs , 1980 .

[7]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[8]  J. Maizel,et al.  Enhanced graphic matrix analysis of nucleic acid and protein sequences. , 1981, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Joseph Y.-T. Leung,et al.  Efficient algorithms for interval graphs and circular-arc graphs , 1982, Networks.

[10]  Jerrold R. Griggs,et al.  Interval graphs and maps of DNA. , 1986, Bulletin of mathematical biology.

[11]  Jill P. Mesirov,et al.  Study of protein sequence comparison metrics on the connection machine CM-2 , 1989, Proceedings Supercomputing Vol.II: Science and Applications.

[12]  M. Golumbic Algorithmic graph theory and perfect graphs , 1980 .

[13]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Webb Miller,et al.  A space-efficient algorithm for local similarities , 1990, Comput. Appl. Biosci..

[15]  S. Henikoff,et al.  Finding protein similarities with nucleotide sequence databases. , 1990, Methods in enzymology.

[16]  Alan A. Bertossi,et al.  Total Domination and Irredundance in Weighted Interval Graphs , 1988, SIAM J. Discret. Math..

[17]  Jude Shavlik,et al.  Finding Genes by Case-Based Reasoning in the Presence of Noisy Case Boundaries * , 1991 .

[18]  E. Myers,et al.  Sequence comparison with concave weighting functions. , 1988, Bulletin of mathematical biology.

[19]  D. Lipman,et al.  Rapid and sensitive protein similarity searches. , 1985, Science.

[20]  Frank Harary,et al.  On double and multiple interval graphs , 1979, J. Graph Theory.