Computational methodology for ChIP-seq analysis

Chromatin immunoprecipitation coupled with massive parallel sequencing (ChIP-seq) is a powerful technology to identify the genome-wide locations of DNA binding proteins such as transcription factors or modified histones. As more and more experimental laboratories are adopting ChIP-seq to unravel the transcriptional and epigenetic regulatory mechanisms, computational analyses of ChIP-seq also become increasingly comprehensive and sophisticated. In this article, we review current computational methodology for ChIP-seq analysis, recommend useful algorithms and workflows, and introduce quality control measures at different analytical steps. We also discuss how ChIP-seq could be integrated with other types of genomic assays, such as gene expression profiling and genome-wide association studies, to provide a more comprehensive view of gene regulatory mechanisms in important physiological and pathological processes.

[1]  Nicola J. Rinaldi,et al.  Computational discovery of gene modules and regulatory networks , 2003, Nature Biotechnology.

[2]  David A. Nix,et al.  Empirical methods for controlling false positives and estimating confidence in ChIP-Seq peaks , 2008, BMC Bioinformatics.

[3]  W. Ansorge Next-generation DNA sequencing techniques. , 2009, New biotechnology.

[4]  Ling V. Sun,et al.  Hotspots of transcription factor colocalization in the genome of Drosophila melanogaster , 2006, Proceedings of the National Academy of Sciences.

[5]  M. Metzker Sequencing technologies — the next generation , 2010, Nature Reviews Genetics.

[6]  Clifford A. Meyer,et al.  Nucleosome Dynamics Define Transcriptional Enhancers , 2010, Nature Genetics.

[7]  Daniel J. Blankenberg,et al.  Galaxy: a platform for interactive large-scale genome analysis. , 2005, Genome research.

[8]  S. Nelson,et al.  Shotgun bisulphite sequencing of the Arabidopsis genome reveals DNA methylation patterning , 2008, Nature.

[9]  N. D. Clarke,et al.  Integration of External Signaling Pathways with the Core Transcriptional Network in Embryonic Stem Cells , 2008, Cell.

[10]  David Haussler,et al.  ENCODE whole-genome data in the UCSC genome browser (2011 update) , 2010, Nucleic Acids Res..

[11]  Chee Seng Chan,et al.  CTCF-Mediated Functional Chromatin Interactome in Pluripotent Cells , 2011, Nature Genetics.

[12]  Timothy J. Durham,et al.  "Systematic" , 1966, Comput. J..

[13]  Alexander Varshavsky,et al.  Mapping proteinDNA interactions in vivo with formaldehyde: Evidence that histone H4 is retained on a highly transcribed gene , 1988, Cell.

[14]  Hongkai Ji,et al.  Computational analysis of ChIP-seq data. , 2010, Methods in molecular biology.

[15]  J. Carroll,et al.  FOXA1 is a critical determinant of Estrogen Receptor function and endocrine response , 2010, Nature Genetics.

[16]  Martin Kircher,et al.  Addressing challenges in the production and analysis of illumina sequencing data , 2011, BMC Genomics.

[17]  Timothy L. Bailey,et al.  Gene expression Advance Access publication May 4, 2011 DREME: motif discovery in transcription factor ChIP-seq data , 2011 .

[18]  Cory Y. McLean,et al.  GREAT improves functional interpretation of cis-regulatory regions , 2010, Nature Biotechnology.

[19]  Cole Trapnell,et al.  Computational methods for transcriptome annotation and quantification using RNA-seq , 2011, Nature Methods.

[20]  M. Facciotti,et al.  Evaluation of Algorithm Performance in ChIP-Seq Peak Detection , 2010, PloS one.

[21]  W. Huber,et al.  Differential expression analysis for sequence count data , 2010 .

[22]  Helga Thorvaldsdóttir,et al.  Integrative Genomics Viewer , 2011, Nature Biotechnology.

[23]  M. Gerstein,et al.  Variation in Transcription Factor Binding Among Humans , 2010, Science.

[24]  E. Liu,et al.  An Oestrogen Receptor α-bound Human Chromatin Interactome , 2009, Nature.

[25]  G. Tuteja,et al.  Extracting transcription factor targets from ChIP-Seq data , 2009, Nucleic acids research.

[26]  Joseph K. Pickrell,et al.  False positive peaks in ChIP-seq and other sequencing-based functional assays caused by unannotated high copy number regions , 2011, Bioinform..

[27]  S. Luo,et al.  Direct measurement of DNA affinity landscapes on a high-throughput sequencing instrument , 2011, Nature Biotechnology.

[28]  A. Mortazavi,et al.  Genome-Wide Mapping of in Vivo Protein-DNA Interactions , 2007, Science.

[29]  Clifford A. Meyer,et al.  Chromosome-Wide Mapping of Estrogen Receptor Binding Reveals Long-Range Regulation Requiring the Forkhead Protein FoxA1 , 2005, Cell.

[30]  Robert L. Grossman,et al.  A cis-regulatory map of the Drosophila genome , 2011, Nature.

[31]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[32]  Li Wang,et al.  Single-tube linear DNA amplification (LinDA) for robust ChIP-seq , 2011, Nature Methods.

[33]  Wing-Kin Sung,et al.  Gene expression An HMM approach to genome-wide identification of differential histone modification sites from ChIP-seq data , 2008 .

[34]  T. Mikkelsen,et al.  The NIH Roadmap Epigenomics Mapping Consortium , 2010, Nature Biotechnology.

[35]  R. Knight,et al.  Error-correcting barcoded primers for pyrosequencing hundreds of samples in multiplex , 2008, Nature Methods.

[36]  Z. Weng,et al.  Developmental regulation and individual differences of neuronal H3K4me3 epigenomes in the prefrontal cortex , 2010, Proceedings of the National Academy of Sciences.

[37]  Sebastian Bauer,et al.  Microindel detection in short-read sequence data , 2010, Bioinform..

[38]  Christopher B. Burge,et al.  c-Myc Regulates Transcriptional Pause Release , 2010, Cell.

[39]  D. Haussler,et al.  Species-specific endogenous retroviruses shape the transcriptional network of the human tumor suppressor protein p53 , 2007, Proceedings of the National Academy of Sciences.

[40]  David A. Orlando,et al.  Mediator and Cohesin Connect Gene Expression and Chromatin Architecture , 2010, Nature.

[41]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[42]  Yijun Ruan,et al.  Chromatin Interaction Analysis Using Paired‐End Tag Sequencing , 2010, Current protocols in molecular biology.

[43]  Terrence S. Furey,et al.  The UCSC Table Browser data retrieval tool , 2004, Nucleic Acids Res..

[44]  D. Haussler,et al.  Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. , 2005, Genome research.

[45]  Raymond K. Auerbach,et al.  PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls , 2009, Nature Biotechnology.

[46]  C. Glass,et al.  Reprogramming Transcription via Distinct Classes of Enhancers Functionally Defined by eRNA , 2011, Nature.

[47]  Peter J. Bickel,et al.  Measuring reproducibility of high-throughput experiments , 2011, 1110.4705.

[48]  Mohammed AlQuraishi,et al.  Direct inference of protein–DNA interactions using compressed sensing methods , 2011, Proceedings of the National Academy of Sciences.

[49]  Mary Goldman,et al.  The UCSC Genome Browser database: update 2011 , 2010, Nucleic Acids Res..

[50]  K. Kristiansen,et al.  Single base–resolution methylome of the silkworm reveals a sparse epigenomic map , 2010, Nature Biotechnology.

[51]  Dustin E. Schones,et al.  High-Resolution Profiling of Histone Methylations in the Human Genome , 2007, Cell.

[52]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[53]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[54]  R. Lister,et al.  Highly Integrated Single-Base Resolution Maps of the Epigenome in Arabidopsis , 2008, Cell.

[55]  W. Sung,et al.  ChIA-PET tool for comprehensive chromatin interaction analysis with paired-end tag sequencing , 2010, Genome Biology.

[56]  D. Schübeler,et al.  Determinants and dynamics of genome accessibility , 2011, Nature Reviews Genetics.

[57]  Gabor T. Marth,et al.  EagleView: a genome assembly viewer for next-generation sequencing technologies. , 2008, Genome research.

[58]  Bing Ren,et al.  Mapping higher order structure of chromatin domains , 2011, Nature Genetics.

[59]  E. Birney,et al.  Allele-specific and heritable chromatin signatures in humans. , 2010, Human molecular genetics.

[60]  James A. Cuff,et al.  A Bivalent Chromatin Structure Marks Key Developmental Genes in Embryonic Stem Cells , 2006, Cell.

[61]  C. Nusbaum,et al.  Chromosome Conformation Capture Carbon Copy (5C): a massively parallel solution for mapping interactions between genomic elements. , 2006, Genome research.

[62]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[63]  Zhi Xie,et al.  hPDI: a database of experimental human protein-DNA interactions , 2010, Bioinform..

[64]  Raymond K. Auerbach,et al.  modENCODE Project Genome by the Caenorhabditis elegans Integrative Analysis of the , 2011 .

[65]  John D. Storey A direct approach to false discovery rates , 2002 .

[66]  I. Amit,et al.  Comprehensive mapping of long range interactions reveals folding principles of the human genome , 2011 .

[67]  Manolis Kellis,et al.  Discovery and Characterization of Chromatin States for Systematic Annotation of the Human Genome , 2011, RECOMB.

[68]  Colin N. Dewey,et al.  Discovering Transcription Factor Binding Sites in Highly Repetitive Regions of Genomes with Multi-Read Analysis of ChIP-Seq Data , 2011, PLoS Comput. Biol..

[69]  Myles Brown,et al.  BINOCh: binding inference from nucleosome occupancy changes , 2011, Bioinform..

[70]  Peter V Kharchenko,et al.  Chromatin signatures of the Drosophila replication program. , 2011, Genome research.

[71]  David Haussler,et al.  The UCSC Genome Browser database: update 2010 , 2009, Nucleic Acids Res..

[72]  Wing-Kin Sung,et al.  CENTDIST: discovery of co-associated factors by motif distribution , 2011, Nucleic Acids Res..

[73]  Jun Song,et al.  CEAS: cis-regulatory element annotation system , 2006, Nucleic Acids Res..

[74]  Tao Liu,et al.  A Circadian Rhythm Orchestrated by Histone Deacetylase 3 Controls Hepatic Lipid Metabolism , 2011, Science.

[75]  Brad T. Sherman,et al.  Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources , 2008, Nature Protocols.

[76]  Adam A. Margolin,et al.  Reverse engineering of regulatory networks in human B cells , 2005, Nature Genetics.

[77]  V. Theodorou,et al.  Estrogen receptor action in three dimensions - looping the loop , 2010, Breast Cancer Research.

[78]  Aaron R. Quinlan,et al.  Bioinformatics Applications Note Genome Analysis Bedtools: a Flexible Suite of Utilities for Comparing Genomic Features , 2022 .

[79]  Ann E. Loraine,et al.  The Integrated Genome Browser: free software for distribution and exploration of genome-scale datasets , 2009, Bioinform..

[80]  Paul D. Shaw,et al.  BIOINFORMATICS APPLICATIONS NOTE , 2022 .

[81]  William Stafford Noble,et al.  Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project , 2007, Nature.

[82]  Richard Durbin,et al.  Fast and accurate long-read alignment with Burrows–Wheeler transform , 2010, Bioinform..

[83]  Michael Q. Zhang,et al.  Combinatorial patterns of histone acetylations and methylations in the human genome , 2008, Nature Genetics.

[84]  Hongkai Ji Computational Analysis of ChIP-chip Data , 2011, Handbook of Statistical Bioinformatics.

[85]  Timothy J. Durham,et al.  Systematic analysis of chromatin state dynamics in nine human cell types , 2011, Nature.

[86]  Mario Medvedovic,et al.  Bayesian hierarchical model for transcriptional module discovery by jointly modeling gene expression and ChIP-chip data , 2007, BMC Bioinformatics.

[87]  P. Park,et al.  Design and analysis of ChIP-seq experiments for DNA-binding proteins , 2008, Nature Biotechnology.

[88]  P. Farnham,et al.  Functional Analysis of KAP1 Genomic Recruitment , 2011, Molecular and Cellular Biology.

[89]  Clifford A. Meyer,et al.  Model-based Analysis of ChIP-Seq (MACS) , 2008, Genome Biology.

[90]  Qian Wang,et al.  A comprehensive view of nuclear receptor cancer cistromes. , 2011, Cancer research.

[91]  Qunfeng Dong,et al.  Administering GBrowse Sites with WebGBrowse , 2011, Current protocols in bioinformatics.

[92]  Maureen J Donlin,et al.  Using the Generic Genome Browser (GBrowse) , 2007, Current protocols in bioinformatics.

[93]  Wing Hung Wong,et al.  Using CisGenome to Analyze ChIP‐chip and ChIP‐seq Data , 2011, Current protocols in bioinformatics.

[94]  Rui Jiang,et al.  Evaluation of next-generation sequencing software in mapping and assembly , 2011, Journal of Human Genetics.

[95]  C. Finch,et al.  Next-generation sequencing in aging research: Emerging applications, problems, pitfalls and possible solutions , 2010, Ageing Research Reviews.

[96]  Clifford A. Meyer,et al.  FoxA1 Translates Epigenetic Signatures into Enhancer-Driven Lineage-Specific Transcription , 2008, Cell.

[97]  S. Schuster Next-generation sequencing transforms today's biology , 2008, Nature Methods.

[98]  Ting Wang,et al.  ENCODE whole-genome data in the UCSC Genome Browser , 2009, Nucleic Acids Res..

[99]  X. Shirley Liu,et al.  Essential and Redundant Functions of Caudal Family Proteins in Activating Adult Intestinal Genes , 2011, Molecular and Cellular Biology.

[100]  Tom H. Pringle,et al.  The human genome browser at UCSC. , 2002, Genome research.

[101]  P. Farnham,et al.  Using ChIP-seq technology to generate high-resolution profiles of histone modifications. , 2011, Methods in molecular biology.

[102]  Martin A. Nowak,et al.  Inferring Cellular Networks Using Probabilistic Graphical Models , 2004 .

[103]  Lovelace J. Luquette,et al.  Comprehensive analysis of the chromatin landscape in Drosophila , 2010, Nature.

[104]  J. Winderickx,et al.  Inferring transcriptional modules from ChIP-chip, motif and microarray data , 2006, Genome Biology.

[105]  Siu-Ming Yiu,et al.  SOAP2: an improved ultrafast tool for short read alignment , 2009, Bioinform..

[106]  Mazhar Adli,et al.  Genome-wide chromatin maps derived from limited numbers of hematopoietic progenitors , 2010, Nature Methods.

[107]  Chen Zeng,et al.  A clustering approach for identification of enriched domains from histone modification ChIP-Seq data , 2009, Bioinform..

[108]  William Stafford Noble,et al.  Unsupervised pattern discovery in human chromatin structure through genomic segmentation , 2012, Nature Methods.

[109]  Chiara Sabatti,et al.  Network component analysis: Reconstruction of regulatory signals in biological systems , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[110]  Nathaniel D. Heintzman,et al.  Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome , 2007, Nature Genetics.

[111]  Hui Guo,et al.  MapView: visualization of short reads alignment on a desktop computer , 2009, Bioinform..

[112]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[113]  Clifford A. Meyer,et al.  Cistrome: an integrative platform for transcriptional regulation studies , 2011, Genome Biology.

[114]  Steven J. M. Jones,et al.  FindPeaks 3.1: a tool for identifying areas of enrichment from massively parallel short-read sequencing technology , 2008, Bioinform..

[115]  J. Ahringer,et al.  Differential chromatin marking of introns and expressed exons by H3K36me3 , 2008, Nature Genetics.

[116]  Martha L. Bulyk,et al.  UniPROBE, update 2011: expanded content and search tools in the online database of protein-binding microarray data on protein–DNA interactions , 2010, Nucleic Acids Res..

[117]  Brad T. Sherman,et al.  Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists , 2008, Nucleic acids research.

[118]  E. Birney,et al.  Apollo: a sequence annotation editor , 2002, Genome Biology.

[119]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[120]  김삼묘,et al.  “Bioinformatics” 특집을 내면서 , 2000 .

[121]  Michael D. Wilson,et al.  Five-Vertebrate ChIP-seq Reveals the Evolutionary Dynamics of Transcription Factor Binding , 2010, Science.

[122]  Nicole Rusk,et al.  When ChIA PETs meet Hi-C , 2009, Nature Methods.

[123]  M. Campbell,et al.  PANTHER: a library of protein families and subfamilies indexed by function. , 2003, Genome research.

[124]  Raja Jothi,et al.  Genome-wide identification of in vivo protein–DNA binding sites from ChIP-Seq data , 2008, Nucleic acids research.

[125]  Ole Winther,et al.  JASPAR, the open access database of transcription factor-binding profiles: new content and tools in the 2008 update , 2007, Nucleic Acids Res..

[126]  S. Orkin,et al.  An Extended Transcriptional Network for Pluripotency of Embryonic Stem Cells , 2008, Cell.

[127]  Clifford A. Meyer,et al.  Genome-wide analysis of estrogen receptor binding sites , 2006, Nature Genetics.

[128]  R. Durbin,et al.  Mapping Quality Scores Mapping Short Dna Sequencing Reads and Calling Variants Using P

, 2022 .

[129]  David J. Reiss,et al.  Learning transcriptional networks from the integration of ChIP-chip and expression data in a non-parametric model , 2010, Bioinform..

[130]  D. Gifford,et al.  Tissue-specific transcriptional regulation has diverged significantly between human and mouse , 2007, Nature Genetics.

[131]  A. Mortazavi,et al.  Computation for ChIP-seq and RNA-seq studies , 2009, Nature Methods.

[132]  Jennifer A. Mitchell,et al.  Preferential associations between co-regulated genes reveal a transcriptional interactome in erythroid cells , 2010, Nature Genetics.

[133]  Jun S. Liu,et al.  An algorithm for finding protein–DNA binding sites with applications to chromatin-immunoprecipitation microarray experiments , 2002, Nature Biotechnology.

[134]  Martin Goodson,et al.  Stampy: a statistical algorithm for sensitive and fast mapping of Illumina sequence reads. , 2011, Genome research.

[135]  A. Rechtsteiner,et al.  Broad chromosomal domains of histone modification patterns in C. elegans. , 2011, Genome research.

[136]  Gos Micklem,et al.  Supporting Online Material Materials and Methods Figs. S1 to S50 Tables S1 to S18 References Identification of Functional Elements and Regulatory Circuits by Drosophila Modencode , 2022 .

[137]  E. Birney,et al.  Heritable Individual-Specific and Allele-Specific Chromatin Signatures in Humans , 2010, Science.

[138]  G. Coetzee,et al.  8q24 prostate, breast, and colon cancer risk loci show tissue-specific long-range interaction with MYC , 2010, Proceedings of the National Academy of Sciences.

[139]  Tao Liu,et al.  MM-ChIP enables integrative analysis of cross-platform and between-laboratory ChIP-chip or ChIP-seq data , 2011, Genome Biology.

[140]  J. Stamatoyannopoulos,et al.  Genome-wide identification of DNaseI hypersensitive sites using active chromatin sequence libraries. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[141]  Raymond K. Auerbach,et al.  Mapping accessible chromatin regions using Sono-Seq , 2009, Proceedings of the National Academy of Sciences.

[142]  G. Church,et al.  Polony Multiplex Analysis of Gene Expression (PMAGE) in Mouse Hypertrophic Cardiomyopathy , 2007, Science.

[143]  Lee E. Edsall,et al.  Human DNA methylomes at base resolution show widespread epigenomic differences , 2009, Nature.

[144]  Raymond K. Auerbach,et al.  A User's Guide to the Encyclopedia of DNA Elements (ENCODE) , 2011, PLoS biology.

[145]  Matthias Meyer,et al.  Illumina sequencing library preparation for highly multiplexed target capture and sequencing. , 2010, Cold Spring Harbor protocols.

[146]  B. Pugh,et al.  Comprehensive Genome-wide Protein-DNA Interactions Detected at Single-Nucleotide Resolution , 2011, Cell.

[147]  Michael Q. Zhang,et al.  A highly efficient and effective motif discovery method for ChIP-seq/ChIP-chip data using positional information , 2011, Nucleic acids research.

[148]  P. Park ChIP–seq: advantages and challenges of a maturing technology , 2009, Nature Reviews Genetics.

[149]  Jun S. Song,et al.  Identifying Positioned Nucleosomes with Epigenetic Marks in Human from ChIP-Seq , 2008, BMC Genomics.

[150]  S. Batzoglou,et al.  Genome-Wide Analysis of Transcription Factor Binding Sites Based on ChIP-Seq Data , 2008, Nature Methods.

[151]  Clifford A. Meyer,et al.  Androgen Receptor Regulates a Distinct Transcription Program in Androgen-Independent Prostate Cancer , 2009, Cell.

[152]  Clifford A. Meyer,et al.  Differentiation-specific histone modifications reveal dynamic chromatin interactions and partners for the intestinal transcription factor CDX2. , 2010, Developmental cell.

[153]  Thomas J. Hardcastle,et al.  baySeq: Empirical Bayesian methods for identifying differential expression in sequence count data , 2010, BMC Bioinformatics.

[154]  K. Kinzler,et al.  Detection and quantification of rare mutations with massively parallel sequencing , 2011, Proceedings of the National Academy of Sciences.

[155]  Alice Young,et al.  Identifying gene regulatory elements by genome-wide recovery of DNase hypersensitive sites. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[156]  R. Myers,et al.  An Integrated Software System for Analyzing Chip-chip and Chip-seq Data (supplementary Information) , 2008 .

[157]  R. Young,et al.  Histone H3K27ac separates active from poised enhancers and predicts developmental state , 2010, Proceedings of the National Academy of Sciences.

[158]  Philip Machanick,et al.  MEME-ChIP: motif analysis of large DNA datasets , 2011, Bioinform..

[159]  Tao Liu,et al.  CEAS: cis-regulatory element annotation system , 2009, Bioinform..

[160]  Richard A Young,et al.  Control of the Embryonic Stem Cell State , 2011, Cell.

[161]  Shailesh V. Date,et al.  A Probabilistic Functional Network of Yeast Genes , 2004, Science.