The value of position-specific priors in motif discovery using MEME

BackgroundPosition-specific priors have been shown to be a flexible and elegant way to extend the power of Gibbs sampler-based motif discovery algorithms. Information of many types–including sequence conservation, nucleosome positioning, and negative examples–can be converted into a prior over the location of motif sites, which then guides the sequence motif discovery algorithm. This approach has been shown to confer many of the benefits of conservation-based and discriminative motif discovery approaches on Gibbs sampler-based motif discovery methods, but has not previously been studied with methods based on expectation maximization (EM).ResultsWe extend the popular EM-based MEME algorithm to utilize position-specific priors and demonstrate their effectiveness for discovering transcription factor (TF) motifs in yeast and mouse DNA sequences. Utilizing a discriminative, conservation-based prior dramatically improves MEME's ability to discover motifs in 156 yeast TF ChIP-chip datasets, more than doubling the number of datasets where it finds the correct motif. On these datasets, MEME using the prior has a higher success rate than eight other conservation-based motif discovery approaches. We also show that the same type of prior improves the accuracy of motifs discovered by MEME in mouse TF ChIP-seq data, and that the motifs tend to be of slightly higher quality those found by a Gibbs sampling algorithm using the same prior.ConclusionsWe conclude that using position-specific priors can substantially increase the power of EM-based motif discovery algorithms such as MEME algorithm.

[1]  Mikael Bodén,et al.  Assigning roles to DNA regulatory motifs using comparative genomics , 2010, Bioinform..

[2]  Alexander J. Hartemink,et al.  A Nucleosome-Guided Map of Transcription Factor Binding Sites in Yeast , 2007, PLoS Comput. Biol..

[3]  Alexander J. Hartemink,et al.  Nucleosome Occupancy Information Improves de novo Motif Discovery , 2007, RECOMB.

[4]  Alexander E. Kel,et al.  TRANSFAC® and its module TRANSCompel®: transcriptional gene regulation in eukaryotes , 2005, Nucleic Acids Res..

[5]  G. Stormo Information content and free energy in DNA--protein interactions. , 1998, Journal of theoretical biology.

[6]  Gary D. Stormo,et al.  Identifying DNA and protein patterns with statistically significant alignments of multiple sequences , 1999, Bioinform..

[7]  N. D. Clarke,et al.  Integration of External Signaling Pathways with the Core Transcriptional Network in Embryonic Stem Cells , 2008, Cell.

[8]  Tom H. Pringle,et al.  The human genome browser at UCSC. , 2002, Genome research.

[9]  Z. Weng,et al.  Detection of functional DNA motifs via statistical over-representation. , 2004, Nucleic acids research.

[10]  Timothy L. Bailey,et al.  Motif Enrichment Analysis: a unified framework and an evaluation on ChIP data , 2010, BMC Bioinformatics.

[11]  Mikael Bodén,et al.  MEME Suite: tools for motif discovery and searching , 2009, Nucleic Acids Res..

[12]  W. Kruskal,et al.  Use of Ranks in One-Criterion Variance Analysis , 1952 .

[13]  Mathieu Blanchette,et al.  PhyME: A probabilistic algorithm for finding motifs in sets of orthologous sequences , 2004, BMC Bioinformatics.

[14]  Ting Wang,et al.  An improved map of conserved regulatory sites for Saccharomyces cerevisiae , 2006, BMC Bioinformatics.

[15]  Nir Friedman,et al.  A Simple Hyper-Geometric Approach for Discovering Putative Transcription Factor Binding Sites , 2001, WABI.

[16]  Erik van Nimwegen,et al.  PhyloGibbs: A Gibbs Sampling Motif Finder That Incorporates Phylogeny , 2005, PLoS Comput. Biol..

[17]  Michael Gribskov,et al.  Combining evidence using p-values: application to sequence homology searches , 1998, Bioinform..

[18]  Nicola J. Rinaldi,et al.  Transcriptional regulatory code of a eukaryotic genome , 2004, Nature.

[19]  Charles Elkan,et al.  The Value of Prior Knowledge in Discovering Motifs with MEME , 1995, ISMB.

[20]  Barrett C. Foat,et al.  Profiling condition-specific, genome-wide regulation of mRNA stability in yeast. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Alexander J. Hartemink,et al.  A Fast, Alignment-Free, Conservation-Based Method for Transcription Factor Binding Site Discovery , 2008, RECOMB.

[22]  Mathieu Blanchette,et al.  FootPrinter3: phylogenetic footprinting in partially alignable sequences , 2006, Nucleic Acids Res..

[23]  Timothy L. Bailey,et al.  Discriminative motif discovery in DNA and protein sequences using the DEME algorithm , 2007, BMC Bioinformatics.

[24]  Denis C. Bauer,et al.  Studying the functional conservation of cis-regulatory modules and their transcriptional output , 2008, BMC Bioinformatics.