Haystack: systematic analysis of the variation of epigenetic states and cell-type specific regulatory elements

Motivation With the increasing amount of genomic and epigenomic data in the public domain, a pressing challenge is to integrate these data to investigate the role of epigenetic mechanisms in regulating gene expression and maintenance of cell-identity. To this end, we have implemented a computational pipeline to systematically study epigenetic variability and uncover regulatory DNA sequences. Results Haystack is a bioinformatics pipeline to identify hotspots of epigenetic variability across different cell-types, cell-type specific cis-regulatory elements, and associated transcription factors. Haystack is generally applicable to any epigenetic mark and provides an important tool to investigate the mechanisms underlying epigenetic switches during development. This software is accompanied by a set of precomputed tracks, which may be used as a valuable resource for functional annotation of the human genome. Availability and implementation The Haystack pipeline is implemented as an open-source, multiplatform, Python package called haystack_bio freely available at https://github.com/pinellolab/haystack_bio. Contact lpinello@mgh.harvard.edu or gcyuan@jimmy.harvard.edu. Supplementary information Supplementary data are available at Bioinformatics online.

[1]  Manolis Kellis,et al.  Multi-scale chromatin state annotation using a hierarchical hidden Markov model , 2017, Nature Communications.

[2]  David J. Arenillas,et al.  JASPAR 2016: a major expansion and update of the open-access database of transcription factor binding profiles , 2015, Nucleic Acids Res..

[3]  C. Glass,et al.  Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. , 2010, Molecular cell.

[4]  Data production leads,et al.  An integrated encyclopedia of DNA elements in the human genome , 2012 .

[5]  ENCODEConsortium,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[6]  Raymond K. Auerbach,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[7]  Yanjun Qi,et al.  DeepChrome: deep-learning for predicting gene expression from histone modifications , 2016, Bioinform..

[8]  C. Allis,et al.  Translating the Histone Code , 2001, Science.

[9]  Manolis Kellis,et al.  ChromHMM: automating chromatin-state discovery and characterization , 2012, Nature Methods.

[10]  William Stafford Noble,et al.  FIMO: scanning for occurrences of a given motif , 2011, Bioinform..

[11]  T. Mikkelsen,et al.  The NIH Roadmap Epigenomics Mapping Consortium , 2010, Nature Biotechnology.

[12]  Timothy L. Bailey,et al.  Gene expression Advance Access publication May 4, 2011 DREME: motif discovery in transcription factor ChIP-seq data , 2011 .

[13]  Kevin C. Chen,et al.  Spectacle: fast chromatin state annotation using spectral learning , 2015, Genome Biology.

[14]  S. Orkin,et al.  Analysis of chromatin-state plasticity identifies cell-type–specific regulators of H3K27me3 patterns , 2014, Proceedings of the National Academy of Sciences.

[15]  William Stafford Noble,et al.  Unsupervised pattern discovery in human chromatin structure through genomic segmentation , 2012, Nature Methods.