Combinatorial Approaches to Finding Subtle Signals in DNA Sequences

Signal finding (pattern discovery in unaligned DNA sequences) is a fundamental problem in both computer science and molecular biology with important applications in locating regulatory sites and drug target identification. Despite many studies, this problem is far from being resolved: most signals in DNA sequences are so complicated that we don't yet have good models or reliable algorithms for their recognition. We complement existing statistical and machine learning approaches to this problem by a combinatorial approach that proved to be successful in identifying very subtle signals.

[1]  Hanah Margalit,et al.  Identification of common motifs in unaligned DNA sequences: application to Escherichia coli Lrp regulon , 1995, Comput. Appl. Biosci..

[2]  G. Stormo,et al.  ANN-Spec: a method for discovering transcription factor binding sites with improved specificity. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[3]  Rodger Staden,et al.  Methods for discovering novel motifs in nucleic acid sequences , 1989, Comput. Appl. Biosci..

[4]  Gary D. Stormo,et al.  Identifying DNA and protein patterns with statistically significant alignments of multiple sequences , 1999, Bioinform..

[5]  Mikhail A. Roytberg A search for common patterns in many sequences , 1992, Comput. Appl. Biosci..

[6]  Martin Tompa,et al.  An algorithm for finding novel gapped motifs in DNA sequences , 1998, RECOMB '98.

[7]  Jun S. Liu,et al.  Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. , 1993, Science.

[8]  D. Bacon,et al.  Multiple sequence comparison. , 1990, Methods in enzymology.

[9]  H. Smith,et al.  A restriction enzyme from Hemophilus influenzae. II. , 1970, Journal of molecular biology.

[10]  J. Collado-Vides,et al.  Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. , 1998, Journal of molecular biology.

[11]  Grit Herrmann,et al.  Identification of functional elements in unaligned nucleic acid sequences by a novel tuple search algorithm , 1996, Comput. Appl. Biosci..

[12]  David S. Johnson,et al.  Computers and In stractability: A Guide to the Theory of NP-Completeness. W. H Freeman, San Fran , 1979 .

[13]  Aris Floratos,et al.  Combinatorial pattern discovery in biological sequences: The TEIRESIAS algorithm [published erratum appears in Bioinformatics 1998;14(2): 229] , 1998, Bioinform..

[14]  Bin Ma,et al.  Finding similar regions in many strings , 1999, STOC '99.

[15]  Jun S. Liu,et al.  Gibbs motif sampling: Detection of bacterial outer membrane protein repeats , 1995, Protein science : a publication of the Protein Society.

[16]  G. Pesole,et al.  WORDUP: an efficient algorithm for discovering statistically significant patterns in DNA sequences. , 1992, Nucleic acids research.

[17]  Charles Elkan,et al.  Unsupervised learning of multiple motifs in biopolymers using expectation maximization , 1995, Mach. Learn..

[18]  P. Argos,et al.  Motif recognition and alignment for many sequences by comparison of dot-matrices. , 1991, Journal of molecular biology.

[19]  Hamilton O. Smith,et al.  A restriction enzyme from Hemophilus influenzae: II. Base sequence of the recognition site , 1970 .

[20]  Structures and Algorithms with Java — Fall 2017 , .

[21]  Martin Tompa,et al.  An Exact Method for Finding Short Motifs in Sequences, with Application to the Ribosome Binding Site Problem , 1999, ISMB.

[22]  G. Stormo,et al.  Identifying protein-binding sites from unaligned DNA fragments. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[23]  E. Koonin,et al.  Prediction of transcription regulatory sites in Archaea by a comparative genomic approach. , 2000, Nucleic acids research.

[24]  Hiroki Arimura,et al.  On approximation algorithms for local multiple alignment , 2000, RECOMB '00.

[25]  Mark Jerrum,et al.  Large Cliques Elude the Metropolis Process , 1992, Random Struct. Algorithms.

[26]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[27]  Martin Vingron,et al.  Multiple Sequence Comparison and Consistency on Multipartite Graphs , 1995 .

[28]  David R. Gilbert,et al.  Approaches to the Automatic Discovery of Patterns in Biosequences , 1998, J. Comput. Biol..

[29]  Noga Alon,et al.  Finding a large hidden clique in a random graph , 1998, SODA '98.

[30]  Mauno Vihinen,et al.  An algorithm for simultaneous comparison of several sequences , 1988, Comput. Appl. Biosci..

[31]  O. Bagasra,et al.  Proceedings of the National Academy of Sciences , 1914, Science.