Applicability of the multiple alignment algorithm for detection of weak patterns: periodically distributed DNA pattern as a study case

MOTIVATION A nucleosome DNA positioning pattern is known to be one of the weakest (highly degenerated) patterns. The alignment procedure that has been developed recently for the extraction of such a pattern is based on a statistical matching of the sequences, and its success depends on the pattern/background ratio in the individual sequences and in the generated pattern. The heuristic nature of the method and distinctive properties of the pattern bring up the question of efficiency and sensitivity in the procedure. This paper presents a method of verification for this multiple sequence alignment algorithm. RESULTS To verify the applicability of the multiple alignment approach, we constructed a set of sequences carrying the hidden pattern. The pattern was presented by weak ('signal') oscillations of occurrences of AA and TT dinucleotides along otherwise random sequences. Only a few dinucleotides of any given 145 base long sequence would correspond to the signal, appearing in about the same phase within the simulated periodic pattern. The novelty of our simulation approach is that we simulated a database as a whole, as opposed to simulating each sequence separately. The correlation between the hidden pattern and a sequence from the database is negligible on average, but our statistical multicycle alignment procedure produced the pattern with attributes very close to the simulated ones. The accuracy of the procedure was tested and calibrated. The presence in a typical sequence of as little as three dinucleotides corresponding to the signal is sufficient to generate (detect) the pattern hidden in a collection of 204 sequences.

[1]  Jun S. Liu,et al.  Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. , 1993, Science.

[2]  D. Haussler,et al.  Hidden Markov models in computational biology. Applications to protein modeling. , 1993, Journal of molecular biology.

[3]  D. K. Y. Chiu,et al.  A survey of multiple sequence comparison methods , 1992 .

[4]  Alexander Bolshoy,et al.  CC dinucleotides contribute to the bending of DNA in chromatin , 1995, Nature Structural Biology.

[5]  M. Waterman,et al.  Line geometries for sequence comparisons , 1984 .

[6]  H. Drew,et al.  Sequence periodicities in chicken nucleosome core DNA. , 1986, Journal of molecular biology.

[7]  A. F. Neuwald,et al.  Detecting patterns in protein sequences. , 1994, Journal of molecular biology.

[8]  M. Borodovsky,et al.  Nucleosome DNA sequence pattern revealed by multiple alignment of experimentally mapped sequences. , 1996, Journal of molecular biology.

[9]  Kevin Karplus,et al.  A Flexible Motif Search Technique Based on Generalized Profiles , 1996, Comput. Chem..

[10]  William R. Taylor,et al.  Multiple sequence alignment by a pairwise algorithm , 1987, Comput. Appl. Biosci..

[11]  E N Trifonov,et al.  A computer algorithm for testing potential prokaryotic terminators. , 1984, Nucleic acids research.

[12]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[13]  D. Bacon,et al.  Multiple Sequence Alignment , 1986, Journal of molecular biology.

[14]  Mikhail A. Roytberg A search for common patterns in many sequences , 1992, Comput. Appl. Biosci..

[15]  E. Trifonov,et al.  Nucleosomal DNA sequence database. , 1993, Nucleic acids research.

[16]  S. Elgin,et al.  Nucleosome positioning and gene regulation , 1994, Journal of cellular biochemistry.

[17]  M. Gribskov,et al.  [9] Profile analysis , 1990 .

[18]  Alan P. Wolffe,et al.  Transcription: In tune with the histones , 1994, Cell.

[19]  A. Wolffe,et al.  Nucleosome positioning and modification: chromatin structures that potentiate transcription. , 1994, Trends in biochemical sciences.

[20]  G. Stormo Consensus patterns in DNA. , 1990, Methods in enzymology.

[21]  M. Gribskov,et al.  Profile Analysis , 1970 .

[22]  M. Waterman Mathematical Methods for DNA Sequences , 1989 .

[23]  E. Trifonov,et al.  The pitch of chromatin DNA is reflected in its nucleotide sequence. , 1980, Proceedings of the National Academy of Sciences of the United States of America.