DECOD: fast and accurate discriminative DNA motif finding

MOTIVATION Motif discovery is now routinely used in high-throughput studies including large-scale sequencing and proteomics. These datasets present new challenges. The first is speed. Many motif discovery methods do not scale well to large datasets. Another issue is identifying discriminative rather than generative motifs. Such discriminative motifs are important for identifying co-factors and for explaining changes in behavior between different conditions. RESULTS To address these issues we developed a method for DECOnvolved Discriminative motif discovery (DECOD). DECOD uses a k-mer count table and so its running time is independent of the size of the input set. By deconvolving the k-mers DECOD considers context information without using the sequences directly. DECOD outperforms previous methods both in speed and in accuracy when using simulated and real biological benchmark data. We performed new binding experiments for p53 mutants and used DECOD to identify p53 co-factors, suggesting new mechanisms for p53 activation. AVAILABILITY The source code and binaries for DECOD are available at http://www.sb.cs.cmu.edu/DECOD CONTACT: zivbj@cs.cmu.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  George M Church,et al.  Collection and Motif-Based Prediction of Phosphorylation Sites in Human Viruses , 2010, Science Signaling.

[2]  I. Simon,et al.  Reconstructing dynamic regulatory maps , 2007, Molecular systems biology.

[3]  Qing Zhou,et al.  Identification of Context-Dependent Motifs by Contrasting ChIP Binding Data , 2010, Bioinform..

[4]  Charles Elkan,et al.  Fitting a Mixture Model By Expectation Maximization To Discover Motifs In Biopolymer , 1994, ISMB.

[5]  Mathieu Blanchette,et al.  Seeder: discriminative seeding DNA motif discovery , 2008, Bioinform..

[6]  Ernest Fraenkel,et al.  Insights into GATA-1-mediated gene activation versus repression via genome-wide chromatin occupancy analysis. , 2009, Molecular cell.

[7]  A. Fersht,et al.  Energy-dependent nucleolar localization of p53 in vitro requires two discrete regions within the p53 carboxyl terminus , 2007, Oncogene.

[8]  William Stafford Noble,et al.  Assessing computational tools for the discovery of transcription factor binding sites , 2005, Nature Biotechnology.

[9]  Panayiotis V. Benos,et al.  STAMP: a web tool for exploring DNA-binding motif similarities , 2007, Nucleic Acids Res..

[10]  Saurabh Sinha,et al.  On counting position weight matrix matches in a sequence, with application to discriminative motif finding , 2006, ISMB.

[11]  Francis Y. L. Chin,et al.  Finding motifs from all sequences with and without binding sites , 2006, Bioinform..

[12]  P. Brown,et al.  Large-scale meta-analysis of cancer microarray data identifies common transcriptional profiles of neoplastic transformation and progression. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[13]  I. Simon,et al.  Chromatin immunoprecipitation-on-chip reveals stress-dependent p53 occupancy in primary normal cells but not in established cell lines. , 2008, Cancer research.

[14]  Nicola J. Rinaldi,et al.  Transcriptional regulatory code of a eukaryotic genome , 2004, Nature.

[15]  Eduardo Sontag,et al.  Transcriptional control of human p53-regulated genes , 2008, Nature Reviews Molecular Cell Biology.

[16]  Peer Bork,et al.  A network of conserved co-occurring motifs for the regulation of alternative splicing , 2010, Nucleic acids research.

[17]  H. K. Dai,et al.  A survey of DNA motif finding algorithms , 2007, BMC Bioinformatics.

[18]  P. D’haeseleer What are DNA sequence motifs? , 2006, Nature Biotechnology.

[19]  T. Hupp,et al.  Interferon Regulatory Factor 1 Binding to p300 Stimulates DNA-Dependent Acetylation of p53 , 2004, Molecular and Cellular Biology.

[20]  R. Shamir,et al.  Transcription factor and microRNA motif discovery: the Amadeus platform and a compendium of metazoan target sets. , 2008, Genome research.

[21]  Saurabh Sinha,et al.  YMF: a program for discovery of novel transcription factor binding sites by statistical overrepresentation , 2003, Nucleic Acids Res..

[22]  Z. Weng,et al.  Finding functional sequence elements by multiple local alignment. , 2004, Nucleic acids research.

[23]  Michael Q. Zhang,et al.  Identifying tissue-selective transcription factor binding sites in vertebrate promoters. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[24]  김삼묘,et al.  “Bioinformatics” 특집을 내면서 , 2000 .

[25]  References , 1971 .

[26]  Wei Gu,et al.  Modes of p53 Regulation , 2009, Cell.

[27]  Allen D. Delaney,et al.  Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing , 2007, Nature Methods.

[28]  Michael O. Kelleher,et al.  Regulation of the IRF-1 tumour modifier during the response to genotoxic stress involves an ATM-dependent signalling pathway , 2002, Oncogene.

[29]  X. Chen,et al.  p53 levels, functional domains, and DNA damage determine the extent of the apoptotic response of tumor cells. , 1996, Genes & development.

[30]  Zhaohui S. Qin,et al.  On the detection and refinement of transcription factor binding sites using ChIP-Seq data , 2010, Nucleic acids research.

[31]  Hui-Yan Li,et al.  Induction of SOX4 by DNA damage is critical for p53 stabilization and function , 2009, Proceedings of the National Academy of Sciences.

[32]  G. Church,et al.  Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation , 1998, Nature Biotechnology.

[33]  Z. Weng,et al.  A Global Map of p53 Transcription-Factor Binding Sites in the Human Genome , 2006, Cell.

[34]  Alexander E. Kel,et al.  TRANSFAC® and its module TRANSCompel®: transcriptional gene regulation in eukaryotes , 2005, Nucleic Acids Res..

[35]  Richard A Young,et al.  Chromatin immunoprecipitation and microarray-based analysis of protein location , 2006, Nature Protocols.

[36]  Saurabh Sinha,et al.  Discriminative motifs , 2002, RECOMB '02.

[37]  Timothy L. Bailey,et al.  Discriminative motif discovery in DNA and protein sequences using the DEME algorithm , 2007, BMC Bioinformatics.