Logomaker: beautiful sequence logos in Python

Sequence logos are visually compelling ways of illustrating the biological properties of DNA, RNA, and protein sequences, yet it is currently difficult to generate such logos within the Python programming environment. Here we introduce Logomaker, a Python API for creating publication-quality sequence logos. Logomaker can produce both standard and highly customized logos from any matrix-like array of numbers. Logos are rendered as vector graphics that are easy to stylize using standard matplotlib functions. Methods for creating logos from multiple-sequence alignments are also included. Availability and Implementation Logomaker can be installed using the pip package manager and is compatible with both Python 2.7 and Python 3.6. Source code is available at http://github.com/jbkinney/logomaker. Supplemental Information Documentation is provided at http://logomaker.readthedocs.io. Contact jkinney@cshl.edu.

[1]  Alexandre V. Morozov,et al.  Statistical mechanical modeling of genome-wide transcription factor occupancy data by MatrixREDUCE , 2006, ISMB.

[2]  Mark Gerstein,et al.  GENCODE reference annotation for the human and mouse genomes , 2018, Nucleic Acids Res..

[3]  Daniel Jones,et al.  Measuring cis-regulatory energetics in living cells using allelic manifolds , 2018, bioRxiv.

[4]  Avanti Shrikumar,et al.  Learning Important Features Through Propagating Activation Differences , 2017, ICML.

[5]  Helen M. Berman,et al.  Structure of the CAP-DNA Complex at 2.5 Å Resolution: A Complete Picture of the Protein-DNA Interface , 1996 .

[6]  J. Kinney,et al.  Quantitative Activity Profile and Context Dependence of All Human 5' Splice Sites. , 2018, Molecular cell.

[7]  David G. Knowles,et al.  Predicting Splicing from Primary Sequence with Deep Learning , 2019, Cell.

[8]  M. Cyert,et al.  Quantitative mapping of protein-peptide affinity landscapes using spectrally encoded beads , 2018, bioRxiv.

[9]  J. Kinney,et al.  Using deep sequencing to characterize the biophysical mechanism of a transcriptional regulatory sequence , 2010, Proceedings of the National Academy of Sciences.

[10]  Michael J. Sweredoski,et al.  Systematic approach for dissecting the molecular mechanisms of transcriptional regulation in bacteria , 2018, Proceedings of the National Academy of Sciences.

[11]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[12]  T. D. Schneider,et al.  Sequence logos: a new way to display consensus sequences. , 1990, Nucleic acids research.

[13]  Uri Keich,et al.  High-resolution mapping, characterization, and optimization of autonomously replicating sequences in yeast , 2013, Genome research.

[14]  Quantitative mapping of protein-peptide affinity landscapes using spectrally encoded beads , 2018 .

[15]  David M. McCandlish,et al.  Annual Review of Genomics and Human Genetics Massively Parallel Assays and Quantitative Sequence – Function Relationships , 2019 .

[16]  Helen M. Berman,et al.  Structure of the CAP-DNA complex at 2.5 angstroms resolution: a complete picture of the protein-DNA interface. , 1997, Journal of molecular biology.

[17]  Omar Wagih,et al.  ggseqlogo: a versatile R package for drawing sequence logos , 2017, Bioinform..

[18]  Rob Phillips,et al.  Mapping DNA sequence to transcription factor binding energy in vivo , 2018, bioRxiv.

[19]  B. Stillman,et al.  The origin recognition complex interacts with a bipartite DNA binding site within yeast replicators. , 1995, Proceedings of the National Academy of Sciences of the United States of America.