The BaMM web server for de-novo motif discovery and regulatory sequence analysis

Abstract The BaMM web server offers four tools: (i) de-novo discovery of enriched motifs in a set of nucleotide sequences, (ii) scanning a set of nucleotide sequences with motifs to find motif occurrences, (iii) searching with an input motif for similar motifs in our BaMM database with motifs for >1000 transcription factors, trained from the GTRD ChIP-seq database and (iv) browsing and keyword searching the motif database. In contrast to most other servers, we represent sequence motifs not by position weight matrices (PWMs) but by Bayesian Markov Models (BaMMs) of order 4, which we showed previously to perform substantially better in ROC analyses than PWMs or first order models. To address the inadequacy of P- and E-values as measures of motif quality, we introduce the AvRec score, the average recall over the TP-to-FP ratio between 1 and 100. The BaMM server is freely accessible without registration at https://bammmotif.mpibpc.mpg.de.

[1]  David J. Arenillas,et al.  JASPAR 2018: update of the open-access database of transcription factor binding profiles and its web framework , 2017, Nucleic acids research.

[2]  G. Church,et al.  Nucleotides of transcription factor binding sites exert interdependent effects on the binding affinities of transcription factors. , 2002, Nucleic acids research.

[3]  Korbinian Strimmer,et al.  fdrtool: a versatile R package for estimating local and tail area-based false discovery rates , 2008, Bioinform..

[4]  William Stafford Noble,et al.  Quantifying similarity between motifs , 2007, Genome Biology.

[5]  F. A. Kolpakov,et al.  HOCOMOCO: towards a complete collection of transcription factor binding models for human and mouse via large-scale ChIP-Seq analysis , 2017, Nucleic Acids Res..

[6]  I. Davidson,et al.  The Human Transcription Factor IID Subunit Human TATA-binding Protein-associated Factor 28 Interacts in a Ligand-reversible Manner with the Vitamin D3 and Thyroid Hormone Receptors* , 2000, The Journal of Biological Chemistry.

[7]  Denis Thieffry,et al.  RSAT 2015: Regulatory Sequence Analysis Tools , 2015, Nucleic Acids Res..

[8]  Wyeth W. Wasserman,et al.  The Next Generation of Transcription Factor Binding Site Prediction , 2013, PLoS Comput. Biol..

[9]  Charles Elkan,et al.  Fitting a Mixture Model By Expectation Maximization To Discover Motifs In Biopolymer , 1994, ISMB.

[10]  Alexander E. Kel,et al.  GTRD: a database of transcription factor binding sites identified by ChIP-seq experiments , 2016, Nucleic Acids Res..

[11]  J. Söding,et al.  P-value-based regulatory motif discovery using positional weight matrices , 2013, Genome research.

[12]  A. Jolma,et al.  Methods for Analysis of Transcription Factor DNA-Binding Specificity In Vitro. , 2011, Sub-cellular biochemistry.

[13]  Mikael Bodén,et al.  MEME Suite: tools for motif discovery and searching , 2009, Nucleic Acids Res..

[14]  A. Jolma,et al.  DNA-dependent formation of transcription factor pairs alters their binding specificity , 2015, Nature.

[15]  R. Mann,et al.  The role of DNA shape in protein-DNA recognition , 2009, Nature.

[16]  Daniel E. Newburger,et al.  Diversity and Complexity in DNA Recognition by Transcription Factors , 2009, Science.

[17]  Johannes Söding,et al.  Bayesian Markov models consistently outperform PWMs at predicting motifs in nucleotide sequences , 2016, bioRxiv.

[18]  Victor G. Levitsky,et al.  From binding motifs in Chip-seq Data to Improved Models of transcription factor binding Sites , 2013, J. Bioinform. Comput. Biol..