Scanning sequences after Gibbs sampling to find multiple occurrences of functional elements

BackgroundMany DNA regulatory elements occur as multiple instances within a target promoter. Gibbs sampling programs for finding DNA regulatory elements de novo can be prohibitively slow in locating all instances of such an element in a sequence set.ResultsWe describe an improvement to the A-GLAM computer program, which predicts regulatory elements within DNA sequences with Gibbs sampling. The improvement adds an optional "scanning step" after Gibbs sampling. Gibbs sampling produces a position specific scoring matrix (PSSM). The new scanning step resembles an iterative PSI-BLAST search based on the PSSM. First, it assigns an "individual score" to each subsequence of appropriate length within the input sequences using the initial PSSM. Second, it computes an E-value from each individual score, to assess the agreement between the corresponding subsequence and the PSSM. Third, it permits subsequences with E-values falling below a threshold to contribute to the underlying PSSM, which is then updated using the Bayesian calculus. A-GLAM iterates its scanning step to convergence, at which point no new subsequences contribute to the PSSM. After convergence, A-GLAM reports predicted regulatory elements within each sequence in order of increasing E-values, so users have a statistical evaluation of the predicted elements in a convenient presentation. Thus, although the Gibbs sampling step in A-GLAM finds at most one regulatory element per input sequence, the scanning step can now rapidly locate further instances of the element in each sequence.ConclusionDatasets from experiments determining the binding sites of transcription factors were used to evaluate the improvement to A-GLAM. Typically, the datasets included several sequences containing multiple instances of a regulatory motif. The improvements to A-GLAM permitted it to predict the multiple instances.

[1]  J. Collado-Vides,et al.  Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. , 1998, Journal of molecular biology.

[2]  G. Church,et al.  Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. , 2000, Journal of molecular biology.

[3]  A. Sarai,et al.  Lambda repressor recognizes the approximately 2-fold symmetric half-operator sequences asymmetrically. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Jun S. Liu,et al.  Gibbs motif sampling: Detection of bacterial outer membrane protein repeats , 1995, Protein science : a publication of the Protein Society.

[5]  Donna R. Maglott,et al.  NCBI's LocusLink and RefSeq , 2000, Nucleic Acids Res..

[6]  M. Ptashne,et al.  Specific DNA binding of GAL4, a positive regulatory protein of yeast , 1985, Cell.

[7]  C. Elkan,et al.  Unsupervised learning of multiple motifs in biopolymers using expectation maximization , 1995, Machine Learning.

[8]  G. Crooks,et al.  WebLogo: a sequence logo generator. , 2004, Genome research.

[9]  Mark Ptashne,et al.  Regulation of transcription: from lambda to eukaryotes. , 2005, Trends in biochemical sciences.

[10]  M. Osley The regulation of histone synthesis in the cell cycle. , 1991, Annual review of biochemistry.

[11]  Mark Ptashne,et al.  Interactions between DNA-bound repressors govern regulation by the λ phage repressor , 1979 .

[12]  Jun S. Liu,et al.  An algorithm for finding protein–DNA binding sites with applications to chromatin-immunoprecipitation microarray experiments , 2002, Nature Biotechnology.

[13]  H. Xu,et al.  Gal80‐Gal80 interaction on adjacent Gal4p binding sites is required for complete GAL gene repression , 2001, The EMBO journal.

[14]  Z. Weng,et al.  Detection of functional DNA motifs via statistical over-representation. , 2004, Nucleic acids research.

[15]  Matthew W. Hahn,et al.  The evolution of transcriptional regulation in eukaryotes. , 2003, Molecular biology and evolution.

[16]  D. Landsman,et al.  Statistical analysis of over-represented words in human promoter sequences. , 2004, Nucleic acids research.

[17]  David Landsman,et al.  Alignments anchored on genomic landmarks can aid in the identification of regulatory elements , 2005, ISMB.

[18]  T. Wolfsberg,et al.  Global Regulation by the Yeast Spt10 Protein Is Mediated through Chromatin Structure and the Histone Upstream Activating Sequence Elements , 2005, Molecular and Cellular Biology.

[19]  Mark Ptashne,et al.  A genetic switch in a bacterial virus. , 1982, Scientific American.

[20]  William Stafford Noble,et al.  Assessing computational tools for the discovery of transcription factor binding sites , 2005, Nature Biotechnology.

[21]  Jun S. Liu,et al.  Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. , 1993, Science.

[22]  Lars Juhl Jensen,et al.  Automatic discovery of regulatory patterns in promoter regions based on whole cell expression data and functional annotation , 2000, Bioinform..

[23]  M. Sagot,et al.  Inferring regulatory elements from a whole genome. An analysis of Helicobacter pylori sigma(80) family of promoter signals. , 2000, Journal of molecular biology.

[24]  Mikhail S. Gelfand,et al.  A Gibbs sampler for identification of symmetrically structured, spaced DNA motifs with improved estimation of the signal length , 2005, Bioinform..

[25]  M Ptashne,et al.  Cooperative DNA binding of the yeast transcriptional activator GAL4. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[26]  B. Matthews,et al.  How Cro and lambda-repressor distinguish between operators: the structural basis underlying a genetic switch. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[27]  S. Smale,et al.  Core promoters: active contributors to combinatorial gene regulation. , 2001, Genes & development.

[28]  Saurabh Sinha,et al.  A Statistical Method for Finding Transcription Factor Binding Sites , 2000, ISMB.

[29]  B. Matthews,et al.  Kinetic studies on Cro repressor-operator DNA interaction. , 1987, Journal of molecular biology.

[30]  M. Ptashne,et al.  Separation of DNA binding from the transcription-activating function of a eukaryotic regulatory protein. , 1986, Science.

[31]  Kathleen Marchal,et al.  A higher-order background model improves the detection of promoter regulatory elements by Gibbs sampling , 2001, Bioinform..

[32]  Rodger Staden,et al.  Methods for calculating the probabilities of finding patterns in sequences , 1989, Comput. Appl. Biosci..

[33]  Z. Weng,et al.  Finding functional sequence elements by multiple local alignment. , 2004, Nucleic acids research.

[34]  Michael Q. Zhang,et al.  SCPD: a promoter database of the yeast Saccharomyces cerevisiae , 1999, Bioinform..

[35]  S. Harrison,et al.  DNA sequence preferences of GAL4 and PPR1: how a subset of Zn2 Cys6 binuclear cluster proteins recognizes DNA , 1996, Molecular and cellular biology.

[36]  Charles Elkan,et al.  Unsupervised learning of multiple motifs in biopolymers using expectation maximization , 1995, Mach. Learn..

[37]  G. Stormo,et al.  ANN-Spec: a method for discovering transcription factor binding sites with improved specificity. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[38]  T. D. Schneider,et al.  Sequence logos: a new way to display consensus sequences. , 1990, Nucleic acids research.

[39]  A. Johnson,et al.  Interactions between DNA-bound repressors govern regulation by the lambda phage repressor. , 1979, Proceedings of the National Academy of Sciences of the United States of America.

[40]  L. Karns,et al.  Histone H3 transcription in Saccharomyces cerevisiae is controlled by multiple cell cycle activation sites and a constitutive negative regulatory element , 1992, Molecular and cellular biology.

[41]  R. Kornberg,et al.  A GAL family of upstream activating sequences in yeast: roles in both induction and repression of transcription. , 1986, The EMBO journal.

[42]  G. K. Ackers,et al.  Coupled energetics of lambda cro repressor self-assembly and site-specific DNA operator binding II: cooperative interactions of cro dimers. , 2000, Journal of molecular biology.