Identification of consensus patterns in unaligned DNA sequences known to be functionally related

We have developed a method for identifying consensus patterns in a set of unaligned DNA sequences known to bind a common protein or to have some other common biochemical function. The method is based on a matrix representation of binding site patterns. Each row of the matrix represents one of the four possible bases, each column represents one of the positions of the binding site and each element is determined by the frequency the indicated base occurs at the indicated position. The goal of the method is to find the most significant matrix--i.e. the one with the lowest probability of occurring by chance--out of all the matrices that can be formed from the set of related sequences. The reliability of the method improves with the number of sequences, while the time required increases only linearly with the number of sequences. To test this method, we analysed 11 DNA sequences containing promoters regulated by the Escherichia coli LexA protein. The matrices we found were consistent with the known consensus sequence, and could distinguish the generally accepted LexA binding sites from other DNA sequences.

[1]  G. Walker Mutagenesis and inducible responses to deoxyribonucleic acid damage in Escherichia coli. , 1984, Microbiological reviews.

[2]  H I Miller,et al.  Primary structure of the himA gene of Escherichia coli: homology with DNA-binding protein HU and association with the phenylalanyl-tRNA synthetase operon. , 1984, Cold Spring Harbor symposia on quantitative biology.

[3]  J. R. Fresco,et al.  Nucleotide Sequence , 2020, Definitions.

[4]  G. Stormo Computer methods for analyzing sequence recognition of nucleic acids. , 1988, Annual Review of Biophysics and Biophysical Chemistry.

[5]  D. Mount,et al.  Derepression of specific genes promotes DNA repair and mutagenesis in Escherichia coli , 1988, Journal of bacteriology.

[6]  G. Stormo Consensus patterns in DNA. , 1990, Methods in enzymology.

[7]  P. Finch,et al.  Nucleotide sequence of the regulatory region of the uvrD gene of Escherichia coli. , 1983, Gene.

[8]  S. Elledge,et al.  umuDC and mucAB operons whose products are required for UV light- and chemical-induced mutagenesis: UmuD, MucA, and LexA proteins share homology. , 1985, Proceedings of the National Academy of Sciences of the United States of America.

[9]  R Staden Computer methods to locate signals in nucleic acid sequences , 1984, Nucleic Acids Res..

[10]  D. Mount,et al.  Nucleotide sequence of the lexA gene of Escherichia coli K-12. , 1981, Nucleic acids research.

[11]  R. Brent,et al.  LexA protein is a repressor of the colicin E1 gene. , 1983, The Journal of biological chemistry.

[12]  Aziz Sancar,et al.  LexA protein inhibits transcription of the E. coli uvrA gene in vitro , 1982, Nature.

[13]  E. Beck,et al.  Nucleotide sequence of the gene ompA coding the outer membrane protein II of Escherichia coli K-12 , 1980, Nucleic Acids Res..

[14]  H J Nijkamp,et al.  Structure and regulation of gene expression of a Clo DF13 plasmid DNA region involved in plasmid segregation and incompatibility. , 1983, Nucleic acids research.

[15]  G. Stormo,et al.  Identifying protein-binding sites from unaligned DNA fragments. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[16]  R. Harr,et al.  Search algorithm for pattern match analysis of nucleic acid sequences. , 1983, Nucleic acids research.

[17]  A. Sancar,et al.  Sequences of the recA gene and protein. , 1980, Proceedings of the National Academy of Sciences of the United States of America.

[18]  S. R. Kushner,et al.  Transcription of the uvrD gene of Escherichia coli is controlled by the lexA repressor and by attenuation. , 1983, Nucleic acids research.

[19]  K. Rostas,et al.  Nucleotide sequence and LexA regulation of the Escherichia coli recN gene , 1987, Nucleic Acids Res..

[20]  T. Miyata,et al.  Nucleotide sequence of the structural gene for colicin E1 and predicted structure of the protein. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[21]  T. D. Schneider,et al.  Information content of binding sites on nucleotide sequences. , 1986, Journal of molecular biology.

[22]  H. Echols,et al.  SOS induction and autoregulation of the himA gene for site-specific recombination in Escherichia coli. , 1981, Proceedings of the National Academy of Sciences of the United States of America.

[23]  T. Horii,et al.  Organization of the recA gene of Escherichia coli. , 1980, Proceedings of the National Academy of Sciences of the United States of America.

[24]  G C Walker,et al.  Inducible DNA repair systems. , 1985, Annual review of biochemistry.

[25]  C. Lazdunski,et al.  Complete nucleotide sequence of the structural gene for colicin A, a gene translated at non-uniform rate. , 1983, Journal of molecular biology.

[26]  Aziz Sancar,et al.  The uvrB gene of Escherichia coli has both lexA-repressed and lexA-independent promoters , 1982, Cell.

[27]  T. D. Schneider,et al.  Use of the 'Perceptron' algorithm to distinguish translational initiation sites in E. coli. , 1982, Nucleic acids research.

[28]  M Ptashne,et al.  Mechanism of action of the lexA gene product. , 1981, Proceedings of the National Academy of Sciences of the United States of America.

[29]  J W Little,et al.  Purified lexA protein is a repressor of the recA and lexA genes. , 1981, Proceedings of the National Academy of Sciences of the United States of America.

[30]  W G Beattie,et al.  Multiple control elements for the uvrC gene unit of Escherichia coli. , 1986, Nucleic acids research.

[31]  Y. Mechulam,et al.  Sequence of the Escherichia coli pheST operon and identification of the himA gene , 1985, Journal of bacteriology.

[32]  J. Mankovich,et al.  DNA and amino acid sequence analysis of structural and immunity genes of colicins Ia and Ib , 1986, Journal of bacteriology.

[33]  T. Horii,et al.  Structural analysis of the umu operon required for inducible mutagenesis in Escherichia coli. , 1985, Proceedings of the National Academy of Sciences of the United States of America.

[34]  J. Varley,et al.  Analysis of a cloned colicin Ib gene: complete nucleotide sequence and implications for regulation of expression. , 1984, Nucleic acids research.

[35]  J. Forster,et al.  Organisation and control of the Escherichia coli uvrC gene. , 1985, Gene.