Hierarchical hidden Markov model with application to joint analysis of ChIP-chip and ChIP-seq data

MOTIVATION Chromatin immunoprecipitation (ChIP) experiments followed by array hybridization, or ChIP-chip, is a powerful approach for identifying transcription factor binding sites (TFBS) and has been widely used. Recently, massively parallel sequencing coupled with ChIP experiments (ChIP-seq) has been increasingly used as an alternative to ChIP-chip, offering cost-effective genome-wide coverage and resolution up to a single base pair. For many well-studied TFs, both ChIP-seq and ChIP-chip experiments have been applied and their data are publicly available. Previous analyses have revealed substantial technology-specific binding signals despite strong correlation between the two sets of results. Therefore, it is of interest to see whether the two data sources can be combined to enhance the detection of TFBS. RESULTS In this work, hierarchical hidden Markov model (HHMM) is proposed for combining data from ChIP-seq and ChIP-chip. In HHMM, inference results from individual HMMs in ChIP-seq and ChIP-chip experiments are summarized by a higher level HMM. Simulation studies show the advantage of HHMM when data from both technologies co-exist. Analysis of two well-studied TFs, NRSF and CCCTC-binding factor (CTCF), also suggests that HHMM yields improved TFBS identification in comparison to analyses using individual data sources or a simple merger of the two. AVAILABILITY Source code for the software ChIPmeta is freely available for download at http://www.umich.edu/~hwchoi/HHMMsoftware.zip, implemented in C and supported on linux.

[1]  Z. Weng,et al.  Detection of functional DNA motifs via statistical over-representation. , 2004, Nucleic acids research.

[2]  D. Botstein,et al.  Genomic binding sites of the yeast cell-cycle transcription factors SBF and MBF , 2001, Nature.

[3]  S. L. Scott Bayesian Methods for Hidden Markov Models , 2002 .

[4]  L. Gold Generalized poisson distributions , 1957 .

[5]  Steven J. M. Jones,et al.  Dynamic Remodeling of Individual Nucleosomes Across a Eukaryotic Genome in Response to Transcriptional Perturbation , 2007, PLoS biology.

[6]  N. D. Clarke,et al.  Integration of External Signaling Pathways with the Core Transcriptional Network in Embryonic Stem Cells , 2008, Cell.

[7]  Michael Q. Zhang,et al.  Analysis of the Vertebrate Insulator Protein CTCF-Binding Sites in the Human Genome , 2007, Cell.

[8]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[9]  Dustin E. Schones,et al.  Dynamic Regulation of Nucleosome Positioning in the Human Genome , 2008, Cell.

[10]  G. Parmigiani,et al.  A statistical framework for expression‐based molecular classification in cancer , 2002 .

[11]  T. Gerstenkorn On the generalized Poisson distributions , 1962 .

[12]  Dustin E. Schones,et al.  High-Resolution Profiling of Histone Methylations in the Human Genome , 2007, Cell.

[13]  Svetha Venkatesh,et al.  Learning Hierarchical Hidden Markov Models with General State Hierarchy , 2004, AAAI.

[14]  Renato Paro,et al.  Mapping polycomb-repressed domains in the bithorax complex using in vivo formaldehyde cross-linked chromatin , 1993, Cell.

[15]  Alexander Varshavsky,et al.  Mapping proteinDNA interactions in vivo with formaldehyde: Evidence that histone H4 is retained on a highly transcribed gene , 1988, Cell.

[16]  Wolfgang Huber,et al.  Transcript mapping with high-density oligonucleotide tiling arrays , 2006, Bioinform..

[17]  Allen D. Delaney,et al.  Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing , 2007, Nature Methods.

[18]  P. Park Epigenetics meets next-generation sequencing , 2008, Epigenetics.

[19]  Yoram Singer,et al.  The Hierarchical Hidden Markov Model: Analysis and Applications , 1998, Machine Learning.

[20]  Mark Gerstein,et al.  Bioinformatics Original Paper a Supervised Hidden Markov Model Framework for Efficiently Segmenting Tiling Array Data in Transcriptional and Chip-chip Experiments: Systematically Incorporating Validated Biological Knowledge , 2022 .

[21]  D. Anderson,et al.  Identification of potential target genes for the neuron-restrictive silencer factor. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[22]  Paul P. Gardner,et al.  A hidden Markov model approach for determining expression from genomic tiling micro arrays , 2006, BMC Bioinformatics.

[23]  Z. Weng,et al.  A Global Map of p53 Transcription-Factor Binding Sites in the Human Genome , 2006, Cell.

[24]  X. Chen,et al.  The Oct4 and Nanog transcription network regulates pluripotency in mouse embryonic stem cells , 2006, Nature Genetics.

[25]  Nancy F. Hansen,et al.  Accurate Whole Human Genome Sequencing using Reversible Terminator Chemistry , 2008, Nature.

[26]  Wing Hung Wong,et al.  TileMap: create chromosomal map of tiling array hybridizations , 2005, Bioinform..

[27]  Clifford A. Meyer,et al.  A hidden Markov model for analyzing ChIP-chip experiments on genome tiling arrays and its application to p53 binding sequences , 2005, ISMB.

[28]  T. Mikkelsen,et al.  Genome-wide maps of chromatin state in pluripotent and lineage-committed cells , 2007, Nature.

[29]  David Bulger,et al.  Parameter estimation for robust HMM analysis of ChIP-chip data , 2008, BMC Bioinformatics.

[30]  J. Davie,et al.  The many roles of the transcriptional regulator CTCF. , 2003, Biochemistry and cell biology = Biochimie et biologie cellulaire.

[31]  A. W. Kemp,et al.  Univariate Discrete Distributions , 1993 .

[32]  A. Mortazavi,et al.  Genome-Wide Mapping of in Vivo Protein-DNA Interactions , 2007, Science.

[33]  R Ohlsson,et al.  CTCF is a uniquely versatile transcription regulator linked to epigenetics and disease. , 2001, Trends in genetics : TIG.

[34]  Thomas Werner,et al.  MatInspector and beyond: promoter analysis based on transcription factor binding sites , 2005, Bioinform..

[35]  Yijun Ruan,et al.  Mapping of transcription factor binding regions in mammalian cells by ChIP: comparison of array- and sequencing-based technologies. , 2007, Genome research.

[36]  Kevin P. Murphy,et al.  Modeling recurrent DNA copy number alterations in array CGH data , 2007, ISMB/ECCB.