FIMO: scanning for occurrences of a given motif

Summary: A motif is a short DNA or protein sequence that contributes to the biological function of the sequence in which it resides. Over the past several decades, many computational methods have been described for identifying, characterizing and searching with sequence motifs. Critical to nearly any motif-based sequence analysis pipeline is the ability to scan a sequence database for occurrences of a given motif described by a position-specific frequency matrix. Results: We describe Find Individual Motif Occurrences (FIMO), a software tool for scanning DNA or protein sequences with motifs described as position-specific scoring matrices. The program computes a log-likelihood ratio score for each position in a given sequence database, uses established dynamic programming methods to convert this score to a P-value and then applies false discovery rate analysis to estimate a q-value for each position in the given sequence. FIMO provides output in a variety of formats, including HTML, XML and several Santa Cruz Genome Browser formats. The program is efficient, allowing for the scanning of DNA sequences at a rate of 3.5 Mb/s on a single CPU. Availability and Implementation: FIMO is part of the MEME Suite software toolkit. A web server and source code are available at http://meme.sdsc.edu. Contact: t.bailey@imb.uq.edu.au; t.bailey@imb.uq.edu.au Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  Michael Q. Zhang,et al.  Analysis of the Vertebrate Insulator Protein CTCF-Binding Sites in the Human Genome , 2007, Cell.

[2]  Peter M. Haverty,et al.  MotifViz: an analysis and visualization tool for motif discovery , 2004, Nucleic Acids Res..

[3]  Gary D. Stormo,et al.  Identifying DNA and protein patterns with statistically significant alignments of multiple sequences , 1999, Bioinform..

[4]  William Stafford Noble,et al.  Searching for statistically significant regulatory modules , 2003, ECCB.

[5]  T. Werner,et al.  MatInd and MatInspector: new fast and versatile tools for detection of consensus matches in nucleotide sequence data. , 1995, Nucleic acids research.

[6]  Jacques van Helden,et al.  RSAT: regulatory sequence analysis tools , 2008, Nucleic Acids Res..

[7]  Michael Q. Zhang,et al.  TRED: a transcriptional regulatory element database, new entries and other development , 2007, Nucleic Acids Res..

[8]  Peter M. Haverty,et al.  CisML: an XML-based format for sequence motif detection software , 2004, Bioinform..

[9]  B. De Moor,et al.  Toucan: deciphering the cis-regulatory logic of coregulated genes. , 2003, Nucleic acids research.

[10]  V. Corces,et al.  CTCF: Master Weaver of the Genome , 2009, Cell.

[11]  R Staden Staden: searching for motifs in nucleic acid sequences. , 1994, Methods in molecular biology.

[12]  William Stafford Noble,et al.  Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project , 2007, Nature.

[13]  Michael Gribskov,et al.  Combining evidence using p-values: application to sequence homology searches , 1998, Bioinform..

[14]  Michael Q. Zhang,et al.  Statistical significance of cis-regulatory modules. , 2007, BMC bioinformatics.

[15]  Mikael Bodén,et al.  MEME Suite: tools for motif discovery and searching , 2009, Nucleic Acids Res..

[16]  William Stafford Noble,et al.  Support vector machine , 2013 .

[17]  John D. Storey The positive false discovery rate: a Bayesian interpretation and the q-value , 2003 .

[18]  John D. Storey A direct approach to false discovery rates , 2002 .