Combining multiple ChIP-seq peak detection systems using combinatorial fusion

BackgroundDue to the recent rapid development in ChIP-seq technologies, which uses high-throughput next-generation DNA sequencing to identify the targets of Chromatin Immunoprecipitation, there is an increasing amount of sequencing data being generated that provides us with greater opportunity to analyze genome-wide protein-DNA interactions. In particular, we are interested in evaluating and enhancing computational and statistical techniques for locating protein binding sites. Many peak detection systems have been developed; in this study, we utilize the following six: CisGenome, MACS, PeakSeq, QuEST, SISSRs, and TRLocator.ResultsWe define two methods to merge and rescore the regions of two peak detection systems and analyze the performance based on average precision and coverage of transcription start sites. The results indicate that ChIP-seq peak detection can be improved by fusion using score or rank combination.ConclusionOur method of combination and fusion analysis would provide a means for generic assessment of available technologies and systems and assist researchers in choosing an appropriate system (or fusion method) for analyzing ChIP-seq data. This analysis offers an alternate approach for increasing true positive rates, while decreasing false positive rates and hence improving the ChIP-seq peak identification process.

[1]  David A. Nix,et al.  Empirical methods for controlling false positives and estimating confidence in ChIP-Seq peaks , 2008, BMC Bioinformatics.

[2]  Clifford A. Meyer,et al.  Model-based Analysis of ChIP-Seq (MACS) , 2008, Genome Biology.

[3]  D. Frank Hsu,et al.  Consensus Scoring Criteria for Improving Enrichment in Virtual Screening , 2005, J. Chem. Inf. Model..

[4]  Raja Jothi,et al.  Genome-wide identification of in vivo protein–DNA binding sites from ChIP-Seq data , 2008, Nucleic acids research.

[5]  P. Park,et al.  Design and analysis of ChIP-seq experiments for DNA-binding proteins , 2008, Nature Biotechnology.

[6]  Raymond K. Auerbach,et al.  PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls , 2009, Nature Biotechnology.

[7]  Chuan Yi Tang,et al.  Feature Selection and Combination Criteria for Improving Accuracy in Protein Structure Prediction , 2007, IEEE Transactions on NanoBioscience.

[8]  Bertram Ludäscher,et al.  Sole-Search: an integrated analysis program for peak detection and functional annotation using ChIP-seq data , 2009, Nucleic acids research.

[9]  Alexander Varshavsky,et al.  Mapping proteinDNA interactions in vivo with formaldehyde: Evidence that histone H4 is retained on a highly transcribed gene , 1988, Cell.

[10]  Sargur N. Srihari,et al.  Decision Combination in Multiple Classifier Systems , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  D. Frank Hsu,et al.  Rank-Score Characteristics (RSC) Function and Cognitive Diversity , 2010, Brain Informatics.

[12]  E. Mardis The impact of next-generation sequencing technology on genetics. , 2008, Trends in genetics : TIG.

[13]  Steven J. M. Jones,et al.  FindPeaks 3.1: a tool for identifying areas of enrichment from massively parallel short-read sequencing technology , 2008, Bioinform..

[14]  Bing Ren,et al.  ChIP‐chip for Genome‐Wide Analysis of Protein Binding in Mammalian Cells , 2007, Current protocols in molecular biology.

[15]  Terrence S. Furey,et al.  F-Seq: a feature density estimator for high-throughput sequence tags , 2008, Bioinform..

[16]  M. Facciotti,et al.  Evaluation of Algorithm Performance in ChIP-Seq Peak Detection , 2010, PloS one.

[17]  A. Mortazavi,et al.  Genome-Wide Mapping of in Vivo Protein-DNA Interactions , 2007, Science.

[18]  Paul B. Kantor,et al.  Predicting the effectiveness of naïve data fusion on the basis of system characteristics , 2000, J. Am. Soc. Inf. Sci..

[19]  B. Bernstein,et al.  Genome-wide analysis of histone modifications by ChIP-on-chip. , 2006, Methods.

[20]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[21]  J. Galagan,et al.  A blind deconvolution approach to high-resolution mapping of transcription factor binding sites from ChIP-seq data , 2009, Genome Biology.

[22]  Jeffrey M. Bradshaw,et al.  Brain Informatics , 2011 .

[23]  Hongfang Liu,et al.  Identifying significant genes from microarray data , 2004, Proceedings. Fourth IEEE Symposium on Bioinformatics and Bioengineering.

[24]  Zhaohui S. Qin,et al.  HPeak: an HMM-based algorithm for defining read-enriched regions in ChIP-Seq data , 2010, BMC Bioinformatics.

[25]  F. Sanger,et al.  DNA sequencing with chain-terminating inhibitors. , 1977, Proceedings of the National Academy of Sciences of the United States of America.

[26]  D. Frank Hsu,et al.  Comparing Rank and Score Combination Methods for Data Fusion in Information Retrieval , 2005, Information Retrieval.

[27]  Hui-Huang Hsu,et al.  Advanced Data Mining Technologies in Bioinformatics , 2006 .

[28]  S. Batzoglou,et al.  Genome-Wide Analysis of Transcription Factor Binding Sites Based on ChIP-Seq Data , 2008, Nature Methods.

[29]  B. Williams,et al.  Mapping and quantifying mammalian transcriptomes by RNA-Seq , 2008, Nature Methods.

[30]  D. Frank Hsu,et al.  ChIP-Seq Analytics: Methods and Systems to Improve ChIP-Seq Peak Identification , 2012 .

[31]  Simon Tavaré,et al.  BayesPeak: Bayesian analysis of ChIP-seq data , 2009, BMC Bioinformatics.

[32]  D. Frank Hsu,et al.  Combinatorial Fusion Analysis: Methods and Practices of Combining Multiple Scoring Systems , 2006 .

[33]  Damian M. Lyons,et al.  Combining multiple scoring systems for target tracking using rank-score characteristics , 2009, Inf. Fusion.

[34]  N. D. Clarke,et al.  Integration of External Signaling Pathways with the Core Transcriptional Network in Embryonic Stem Cells , 2008, Cell.

[35]  Hsuan-Cheng Huang,et al.  Systems Biology: Applications in Cancer-Related Research , 2012 .

[36]  Hanlee P. Ji,et al.  Next-generation DNA sequencing , 2008, Nature Biotechnology.

[37]  T. Laajala,et al.  A practical comparison of methods for detecting transcription factor binding sites in ChIP-seq experiments , 2009, BMC Genomics.

[38]  C. Tang,et al.  Identification of degenerate motifs using position restricted selection and hybrid ranking combination , 2006, Nucleic acids research.

[39]  R. Durbin,et al.  Mapping Quality Scores Mapping Short Dna Sequencing Reads and Calling Variants Using P

, 2022 .

[40]  K. Tchou-Wong,et al.  Effects of Nickel Treatment on H3K4 Trimethylation and Gene Expression , 2011, PloS one.

[41]  A. Mortazavi,et al.  Computation for ChIP-seq and RNA-seq studies , 2009, Nature Methods.