MotifLab: a tools and data integration workbench for motif discovery and regulatory sequence analysis

BackgroundTraditional methods for computational motif discovery often suffer from poor performance. In particular, methods that search for sequence matches to known binding motifs tend to predict many non-functional binding sites because they fail to take into consideration the biological state of the cell. In recent years, genome-wide studies have generated a lot of data that has the potential to improve our ability to identify functional motifs and binding sites, such as information about chromatin accessibility and epigenetic states in different cell types. However, it is not always trivial to make use of this data in combination with existing motif discovery tools, especially for researchers who are not skilled in bioinformatics programming.ResultsHere we present MotifLab, a general workbench for analysing regulatory sequence regions and discovering transcription factor binding sites and cis-regulatory modules. MotifLab supports comprehensive motif discovery and analysis by allowing users to integrate several popular motif discovery tools as well as different kinds of additional information, including phylogenetic conservation, epigenetic marks, DNase hypersensitive sites, ChIP-Seq data, positional binding preferences of transcription factors, transcription factor interactions and gene expression. MotifLab offers several data-processing operations that can be used to create, manipulate and analyse data objects, and complete analysis workflows can be constructed and automatically executed within MotifLab, including graphical presentation of the results.ConclusionsWe have developed MotifLab as a flexible workbench for motif analysis in a genomic context. The flexibility and effectiveness of this workbench has been demonstrated on selected test cases, in particular two previously published benchmark data sets for single motifs and modules, and a realistic example of genes responding to treatment with forskolin. MotifLab is freely available at http://www.motiflab.org.

[1]  Alexander J. Hartemink,et al.  Informative priors based on transcription factor structural class improve de novo motif discovery , 2006, ISMB.

[2]  Denis Thieffry,et al.  RSAT 2011: regulatory sequence analysis tools , 2011, Nucleic Acids Res..

[3]  William Stafford Noble,et al.  Epigenetic priors for identifying active transcription factor binding sites , 2012, Bioinform..

[4]  Hui Liu,et al.  Tmod: toolbox of motif discovery , 2010, Bioinform..

[5]  William Stafford Noble,et al.  Assessing computational tools for the discovery of transcription factor binding sites , 2005, Nature Biotechnology.

[6]  G. Church,et al.  Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. , 2000, Journal of molecular biology.

[7]  Domenico de Rasmo,et al.  cAMP/Ca2+ response element‐binding protein plays a central role in the biogenesis of respiratory chain proteins in mammalian cells , 2010, IUBMB life.

[8]  B. Ren,et al.  Genome-wide prediction of transcription factor binding sites using an integrated model , 2010, Genome Biology.

[9]  Graziano Pesole,et al.  An algorithm for finding signals of unknown length in DNA sequences , 2001, ISMB.

[10]  Siu-Ming Yiu,et al.  MotifVoter: a novel ensemble method for fine-grained integration of generic motif finders , 2008, Bioinform..

[11]  David J. Arenillas,et al.  JASPAR 2010: the greatly expanded open-access database of transcription factor binding profiles , 2009, Nucleic Acids Res..

[12]  Kenta Nakai,et al.  Melina II: a web tool for comparisons among several predictive algorithms to find potential motifs from promoter regions , 1996, Nucleic Acids Res..

[13]  Martin C. Frith,et al.  SeqVISTA: a graphical tool for sequence feature visualization and comparison , 2003, BMC Bioinformatics.

[14]  Yongchao Liu,et al.  CompleteMOTIFs: DNA motif discovery platform for transcription factor binding experiments , 2010, Bioinform..

[15]  ENCODEConsortium,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[16]  Daeyoup Lee,et al.  Decoding the genome with an integrative analysis tool: Combinatorial CRM Decoder , 2011, Nucleic acids research.

[17]  Vsevolod J. Makeev,et al.  Deep and wide digging for binding motifs in ChIP-Seq data , 2010, Bioinform..

[18]  Sean R. Eddy,et al.  The Distributed Annotation System , 2001, BMC Bioinformatics.

[19]  Jason B. Ernst,et al.  Integrating multiple evidence sources to predict transcription factor binding in the human genome. , 2010, Genome research.

[20]  H. Lähdesmäki,et al.  Probabilistic Inference of Transcription Factor Binding from Multiple Data Sources , 2008, PloS one.

[21]  Philip Machanick,et al.  The value of position-specific priors in motif discovery using MEME , 2010, BMC Bioinformatics.

[22]  Bart De Moor,et al.  TOUCAN 2: the all-inclusive open source workbench for regulatory sequence analysis , 2005, Nucleic Acids Res..

[23]  Jacob F. Degner,et al.  Sequence and Chromatin Accessibility Data Accurate Inference of Transcription Factor Binding from Dna Material Supplemental Open Access , 2022 .

[24]  Kathleen Marchal,et al.  A higher-order background model improves the detection of promoter regulatory elements by Gibbs sampling , 2001, Bioinform..

[25]  S. Aerts,et al.  i-cisTarget: an integrative genomics method for the prediction of regulatory features and cis-regulatory modules , 2012, Nucleic acids research.

[26]  Daisuke Kihara,et al.  EMD: an ensemble algorithm for discovering regulatory motifs in DNA sequences , 2006, BMC Bioinformatics.

[27]  Alexander J. Hartemink,et al.  A Nucleosome-Guided Map of Transcription Factor Binding Sites in Yeast , 2007, PLoS Comput. Biol..

[28]  Mary Goldman,et al.  The UCSC Genome Browser database: extensions and updates 2013 , 2012, Nucleic Acids Res..

[29]  Bart De Moor,et al.  Computational detection of cis-regulatory modules , 2003, ECCB.

[30]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[31]  Arlindo L. Oliveira,et al.  GRISOTTO: A greedy approach to improve combinatorial algorithms for motif discovery with prior knowledge , 2010, Algorithms for Molecular Biology.

[32]  Alexander E. Kel,et al.  TRANSFAC® and its module TRANSCompel®: transcriptional gene regulation in eukaryotes , 2005, Nucleic Acids Res..

[33]  Alexander J. Hartemink,et al.  Using DNA Duplex Stability Information for Transcription Factor Binding Site Discovery , 2008, Pacific Symposium on Biocomputing.

[34]  Charles Elkan,et al.  Fitting a Mixture Model By Expectation Maximization To Discover Motifs In Biopolymer , 1994, ISMB.

[35]  Finn Drabløs,et al.  Assessment of composite motif discovery methods , 2008, BMC Bioinformatics.

[36]  Finn Drabløs,et al.  PriorsEditor: a tool for the creation and use of positional priors in motif discovery , 2010, Bioinform..

[37]  J. Mallet,et al.  Involvement of NF‐Y and Sp1 in basal and cAMP‐stimulated transcriptional activation of the tryptophan hydroxylase (TPH) gene in the pineal gland , 2002, Journal of neurochemistry.

[38]  J. Daly,et al.  Forskolin: unique diterpene activator of adenylate cyclase in membranes and in intact cells. , 1981, Proceedings of the National Academy of Sciences of the United States of America.

[39]  Finn Drabløs,et al.  Improved benchmarks for computational motif discovery , 2007, BMC Bioinformatics.

[40]  Y. DeClerck,et al.  NF-Y and Sp1 Cooperate for the Transcriptional Activation and cAMP Response of Human Tissue Inhibitor of Metalloproteinases-2* , 2000, The Journal of Biological Chemistry.

[41]  Data production leads,et al.  An integrated encyclopedia of DNA elements in the human genome , 2012 .

[42]  Gary D. Stormo,et al.  ScerTF: a comprehensive database of benchmarked position weight matrices for Saccharomyces species , 2011, Nucleic Acids Res..

[43]  R. Mantovani,et al.  The molecular biology of the CCAAT-binding factor NF-Y. , 1999, Gene.

[44]  A. Sandelin,et al.  Applied bioinformatics for the identification of regulatory elements , 2004, Nature Reviews Genetics.

[45]  Alexander J. Hartemink,et al.  Finding regulatory DNA motifs using alignment-free evolutionary conservation information , 2010, Nucleic acids research.

[46]  Jacques van Helden,et al.  Regulatory Sequence Analysis Tools , 2003, Nucleic Acids Res..

[47]  O. Homann,et al.  MochiView: versatile software for genome browsing and DNA motif analysis , 2010, BMC Biology.

[48]  Douglas L. Brutlag,et al.  BioProspector: Discovering Conserved DNA Motifs in Upstream Regulatory Regions of Co-Expressed Genes , 2000, Pacific Symposium on Biocomputing.

[49]  Jun S. Liu,et al.  An algorithm for finding protein–DNA binding sites with applications to chromatin-immunoprecipitation microarray experiments , 2002, Nature Biotechnology.

[50]  William Stafford Noble,et al.  FIMO: scanning for occurrences of a given motif , 2011, Bioinform..