Methods for discovering novel motifs in nucleic acid sequences

We describe a computer tool to aid the discovery of new motifs in nucleic acid sequences. A typical use would be to analyse a set of upstream regions from a family of related genes in order to find possible control sequences. The heart of the method is the creation of dictionaries of related subsequences. These dictionaries can then be analysed to look for the commonest or best-defined subsequences, those that occur in the highest number of different sequences, or for those in equivalent positions within the family. We show the application of the method to a set of E. coli promoter sequences.

[1]  S C Harvey,et al.  A common structural feature in promoter sequences of E. coli. , 1987, Nucleic acids research.

[2]  Rodger Staden,et al.  Methods for calculating the probabilities of finding patterns in sequences , 1989, Comput. Appl. Biosci..

[3]  A. Travers,et al.  Promoter Sequence for Stringent Control of Bacterial Ribonucleic Acid Synthesis , 1980, Journal of bacteriology.

[4]  G. Stormo,et al.  Identifying protein-binding sites from unaligned DNA fragments. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Rodger Staden,et al.  Methods to define and locate patterns of motifs in sequences , 1988, Comput. Appl. Biosci..

[6]  A. Travers,et al.  Conserved features of coordinately regulated E. coli promoters. , 1984, Nucleic acids research.

[7]  T. D. Schneider,et al.  Information content of binding sites on nucleotide sequences. , 1986, Journal of molecular biology.