DiffLogo: a comparative visualization of sequence motifs

BackgroundFor three decades, sequence logos are the de facto standard for the visualization of sequence motifs in biology and bioinformatics. Reasons for this success story are their simplicity and clarity. The number of inferred and published motifs grows with the number of data sets and motif extraction algorithms. Hence, it becomes more and more important to perceive differences between motifs. However, motif differences are hard to detect from individual sequence logos in case of multiple motifs for one transcription factor, highly similar binding motifs of different transcription factors, or multiple motifs for one protein domain.ResultsHere, we present DiffLogo, a freely available, extensible, and user-friendly R package for visualizing motif differences. DiffLogo is capable of showing differences between DNA motifs as well as protein motifs in a pair-wise manner resulting in publication-ready figures. In case of more than two motifs, DiffLogo is capable of visualizing pair-wise differences in a tabular form. Here, the motifs are ordered by similarity, and the difference logos are colored for clarity. We demonstrate the benefit of DiffLogo on CTCF motifs from different human cell lines, on E-box motifs of three basic helix-loop-helix transcription factors as examples for comparison of DNA motifs, and on F-box domains from three different families as example for comparison of protein motifs.ConclusionsDiffLogo provides an intuitive visualization of motif differences. It enables the illustration and investigation of differences between highly similar motifs such as binding patterns of transcription factors for different cell types, treatments, and algorithmic approaches.

[1]  K. Gevaert,et al.  Improved visualization of protein consensus sequences by iceLogo , 2009, Nature Methods.

[2]  Jens Keilwagen,et al.  Varying levels of complexity in transcription factor binding motifs , 2015, Nucleic acids research.

[3]  Jianhua Lin,et al.  Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.

[4]  Oliver Bembom,et al.  Sequence logos for DNA sequence alignments , 2016 .

[5]  Amos Bairoch,et al.  PROSITE, a protein domain database for functional characterization and annotation , 2009, Nucleic Acids Res..

[6]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[7]  Yue Zhao,et al.  Inferring Binding Energies from Selected Binding Sites , 2009, PLoS Comput. Biol..

[8]  S. Hannenhalli,et al.  CTCF binding site sequence differences are associated with unique regulatory and functional trends during embryonic stem cell differentiation , 2013, Nucleic acids research.

[9]  Juan M. Vaquerizas,et al.  Multiplexed massively parallel SELEX for characterization of human transcription factor binding specificities. , 2010, Genome research.

[10]  S. Karlin,et al.  Prediction of complete gene structures in human genomic DNA. , 1997, Journal of molecular biology.

[11]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[12]  Bart De Moor,et al.  Computational detection of cis-regulatory modules , 2003, ECCB.

[13]  S. M. Ali,et al.  A General Class of Coefficients of Divergence of One Distribution from Another , 1966 .

[14]  Nicola J. Rinaldi,et al.  Transcriptional regulatory code of a eukaryotic genome , 2004, Nature.

[15]  R. Shamir,et al.  Transcription factor and microRNA motif discovery: the Amadeus platform and a compendium of metazoan target sets. , 2008, Genome research.

[16]  D. Galas,et al.  DNAse footprinting: a simple method for the detection of protein-DNA binding specificity. , 1978, Nucleic acids research.

[17]  Michael Q. Zhang,et al.  A highly efficient and effective motif discovery method for ChIP-seq/ChIP-chip data using positional information , 2011, Nucleic acids research.

[18]  Charles Elkan,et al.  Fitting a Mixture Model By Expectation Maximization To Discover Motifs In Biopolymer , 1994, ISMB.

[19]  Steven J. M. Jones,et al.  Locating mammalian transcription factor binding sites: a survey of computational and experimental techniques. , 2006, Genome research.

[20]  Wilfred W. Li,et al.  MEME: discovering and analyzing DNA and protein sequence motifs , 2006, Nucleic Acids Res..

[21]  Panayiotis V. Benos,et al.  STAMP: a web tool for exploring DNA-binding motif similarities , 2007, Nucleic Acids Res..

[22]  Victor V Lobanenkov,et al.  A genome-wide map of CTCF multivalency redefines the CTCF code. , 2013, Cell reports.

[23]  Wyeth W. Wasserman,et al.  JASPAR: an open-access database for eukaryotic transcription factor binding profiles , 2004, Nucleic Acids Res..

[24]  Vladimir Vacic,et al.  Two Sample Logo: a graphical representation of the differences between two sets of sequence alignments , 2006, Bioinform..

[25]  Martha L. Bulyk,et al.  UniPROBE: an online database of protein binding microarray data on protein–DNA interactions , 2008, Nucleic Acids Res..

[26]  R Staden Computer methods to locate signals in nucleic acid sequences , 1984, Nucleic Acids Res..

[27]  A. Mortazavi,et al.  Genome-Wide Mapping of in Vivo Protein-DNA Interactions , 2007, Science.

[28]  Christopher B. Burge,et al.  Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals , 2003, RECOMB '03.

[29]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[30]  T. D. Schneider,et al.  Sequence logos: a new way to display consensus sequences. , 1990, Nucleic acids research.

[31]  J. Keilwagen,et al.  On the Value of Intra-Motif Dependencies of Human Insulator Protein CTCF , 2014, PloS one.

[32]  Barbara E. Engelhardt,et al.  Stability selection for regression-based models of transcription factor–DNA binding specificity , 2013, Bioinform..

[33]  Vsevolod J. Makeev,et al.  Deep and wide digging for binding motifs in ChIP-Seq data , 2010, Bioinform..

[34]  G. Church,et al.  Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. , 2000, Journal of molecular biology.

[35]  Jens Keilwagen,et al.  A general approach for discriminative de novo motif discovery from high-throughput data , 2013, GCB.