ChiLin: a comprehensive ChIP-seq and DNase-seq quality control and analysis pipeline

BackgroundTranscription factor binding, histone modification, and chromatin accessibility studies are important approaches to understanding the biology of gene regulation. ChIP-seq and DNase-seq have become the standard techniques for studying protein-DNA interactions and chromatin accessibility respectively, and comprehensive quality control (QC) and analysis tools are critical to extracting the most value from these assay types. Although many analysis and QC tools have been reported, few combine ChIP-seq and DNase-seq data analysis and quality control in a unified framework with a comprehensive and unbiased reference of data quality metrics.ResultsChiLin is a computational pipeline that automates the quality control and data analyses of ChIP-seq and DNase-seq data. It is developed using a flexible and modular software framework that can be easily extended and modified. ChiLin is ideal for batch processing of many datasets and is well suited for large collaborative projects involving ChIP-seq and DNase-seq from different designs. ChiLin generates comprehensive quality control reports that include comparisons with historical data derived from over 23,677 public ChIP-seq and DNase-seq samples (11,265 datasets) from eight literature-based classified categories. To the best of our knowledge, this atlas represents the most comprehensive ChIP-seq and DNase-seq related quality metric resource currently available. These historical metrics provide useful heuristic quality references for experiment across all commonly used assay types. Using representative datasets, we demonstrate the versatility of the pipeline by applying it to different assay types of ChIP-seq data. The pipeline software is available open source at https://github.com/cfce/chilin.ConclusionChiLin is a scalable and powerful tool to process large batches of ChIP-seq and DNase-seq datasets. The analysis output and quality metrics have been structured into user-friendly directories and reports. We have successfully compiled 23,677 profiles into a comprehensive quality atlas with fine classification for users.

[1]  A. Hyman,et al.  Quantitative Interaction Proteomics and Genome-wide Profiling of Epigenetic Histone Marks and Their Readers , 2010, Cell.

[2]  Marc D. Perry,et al.  ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia , 2012, Genome research.

[3]  Thomas R. Gingeras,et al.  STAR: ultrafast universal RNA-seq aligner , 2013, Bioinform..

[4]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[5]  Matej Oresic,et al.  Genome-wide profiling of interleukin-4 and STAT6 transcription factor regulation of human Th2 cell programming. , 2010, Immunity.

[6]  Nathan C. Sheffield,et al.  The accessible chromatin landscape of the human genome , 2012, Nature.

[7]  O. Kallioniemi,et al.  Dual role of FoxA1 in androgen receptor binding to chromatin, androgen signalling and prostate cancer , 2011, The EMBO journal.

[8]  Alberto Termanini,et al.  Fish the ChIPs: a pipeline for automated genomic annotation of ChIP-Seq data , 2011, Biology Direct.

[9]  Sanjay Gupta,et al.  HIstome—a relational knowledgebase of human histone proteins and histone modifying enzymes , 2011, Nucleic Acids Res..

[10]  David G. Schatz,et al.  The In Vivo Pattern of Binding of RAG1 and RAG2 to Antigen Receptor Loci , 2010, Cell.

[11]  Maria Novatchkova,et al.  The distal V(H) gene cluster of the Igh locus contains distinct regulatory elements with Pax5 transcription factor-dependent activity in pro-B cells. , 2011, Immunity.

[12]  J. Pérez-Ortín,et al.  Cytoplasmic 5′-3′ exonuclease Xrn1p is also a genome-wide transcription factor in yeast , 2013, Front. Genet..

[13]  P. Bickel,et al.  Systematic evaluation of factors influencing ChIP-seq fidelity , 2012, Nature Methods.

[14]  Hanfei Sun,et al.  Target analysis by integration of transcriptome and ChIP-seq data with BETA , 2013, Nature Protocols.

[15]  Jun S. Song,et al.  CHANCE: comprehensive software for quality control and validation of ChIP-seq data , 2012, Genome Biology.

[16]  Tao Ye,et al.  seqMINER: an integrated ChIP-seq data interpretation platform , 2010, Nucleic acids research.

[17]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[18]  Timothy Daley,et al.  Predicting the molecular complexity of sequencing libraries , 2013, Nature Methods.

[19]  Marco-Antonio Mendoza-Parra,et al.  A quality control system for profiles obtained by ChIP sequencing , 2013, Nucleic acids research.

[20]  Juan M. Vaquerizas,et al.  A census of human transcription factors: function, expression and evolution , 2009, Nature Reviews Genetics.

[21]  Boris Lenhard,et al.  Sox2 cooperates with Chd7 to regulate genes that are mutated in human syndromes , 2011, Nature Genetics.

[22]  Raymond K. Auerbach,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[23]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[24]  D. Haussler,et al.  Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. , 2005, Genome research.

[25]  Jing Liu,et al.  CR Cistrome: a ChIP-Seq database for chromatin regulators and histone modification linkages in human and mouse , 2013, Nucleic Acids Res..

[26]  Clifford A. Meyer,et al.  Cistrome: an integrative platform for transcriptional regulation studies , 2011, Genome Biology.

[27]  ENCODEConsortium,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[28]  Rory Stark,et al.  Impact of artifact removal on ChIP quality metrics in ChIP-seq and ChIP-exo data , 2014, Front. Genet..

[29]  Galt P. Barber,et al.  BigWig and BigBed: enabling browsing of large distributed datasets , 2010, Bioinform..

[30]  Tao Liu,et al.  CistromeFinder for ChIP-seq and DNase-seq data reuse , 2013, Bioinform..

[31]  Keji Zhao,et al.  domains barrier regions reveals demarcation of active and repressive Global analysis of the insulator binding protein CTCF in chromatin Material , 2008 .

[32]  Qing-Yu He,et al.  ChIPseeker: an R/Bioconductor package for ChIP peak annotation, comparison and visualization , 2015, Bioinform..

[33]  J. van Helden,et al.  Integrative analysis of public ChIP-seq experiments reveals a complex multi-cell regulatory landscape , 2014, Nucleic acids research.

[34]  Oscar Flores,et al.  htSeqTools: high-throughput sequencing quality control, processing and visualization in R , 2012, Bioinform..

[35]  Bertram Ludäscher,et al.  Sole-Search: an integrated analysis program for peak detection and functional annotation using ChIP-seq data , 2009, Nucleic acids research.

[36]  Data production leads,et al.  An integrated encyclopedia of DNA elements in the human genome , 2012 .

[37]  B. Panning,et al.  An RNAi Screen of Chromatin Proteins Identifies Tip60-p400 as a Regulator of Embryonic Stem Cell Identity , 2008, Cell.

[38]  Nuria Lopez-Bigas,et al.  The mutational landscape of chromatin regulatory factors across 4,623 tumor samples , 2013, Genome Biology.

[39]  Debra L. Fulton,et al.  TFCat: the curated catalog of mouse and human transcription factors , 2009, Genome Biology.

[40]  Rebecca A. Buchanan,et al.  Anadromous Salmonids in the Delta: New Science 2006-2016 , 2016 .

[41]  Olivier Elemento,et al.  An integrated ChIP-seq analysis platform with customizable workflows , 2011, BMC Bioinformatics.

[42]  David Haussler,et al.  Phylogenetic Hidden Markov Models , 2005 .

[43]  K. Ovaska,et al.  FoxA1 specifies unique androgen and glucocorticoid receptor binding events in prostate cancer cells. , 2013, Cancer research.

[44]  Janet Rossant,et al.  Distinct histone modifications in stem cell lines and tissue lineages from the early mouse embryo , 2010, Proceedings of the National Academy of Sciences.

[45]  B. Wold,et al.  Large-Scale Quality Analysis of Published ChIP-seq Data , 2013, G3: Genes, Genomes, Genetics.

[46]  S. Rafii,et al.  Distinct Factors Control Histone Variant H3.3 Localization at Specific Genomic Regions , 2010, Cell.

[47]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[48]  C. Glass,et al.  Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. , 2010, Molecular cell.

[49]  Tao Liu,et al.  CistromeMap: a knowledgebase and web server for ChIP-Seq and DNase-Seq studies in mouse and human , 2012, Bioinform..

[50]  Ying Li,et al.  HiChIP: a high-throughput pipeline for integrative analysis of ChIP-Seq data , 2014, BMC Bioinformatics.

[51]  Christina Backes,et al.  An integer linear programming approach for finding deregulated subgraphs in regulatory networks , 2011, Nucleic acids research.