Comparison of Four ChIP-Seq Analytical Algorithms Using Rice Endosperm H3K27 Trimethylation Profiling Data

Chromatin immunoprecipitation coupled with high throughput DNA Sequencing (ChIP-Seq) has emerged as a powerful tool for genome wide profiling of the binding sites of proteins associated with DNA such as histones and transcription factors. However, no peak calling program has gained consensus acceptance by the scientific community as the preferred tool for ChIP-Seq data analysis. Analyzing the large data sets generated by ChIP-Seq studies remains highly challenging for most molecular biology laboratories. Here we profile H3K27me3 enrichment sites in rice young endosperm using the ChIP-Seq approach and analyze the data using four peak calling algorithms (FindPeaks, PeakSeq, USeq, and MACS). Comparison of the four algorithms reveals that these programs produce very different peaks in terms of peak size, number, and position relative to genes. We verify the peak predictions using ChIP-PCR to evaluate the accuracy of peak prediction of the four algorithms. We discuss the approach of each algorithm and compare similarities and differences in the results. Despite their differences in the peaks identified, all of the programs reach similar conclusions about the effect of H3K27me3 on gene expression. Its presence either upstream or downstream of a gene is predominately associated with repression of the gene. Additionally, GO analysis finds that a substantially higher ratio of genes associated with H3K27me3 were involved in multicellular organism development, signal transduction, response to external and endogenous stimuli, and secondary metabolic pathways than the rest of the rice genome.

[1]  P. Park,et al.  Design and analysis of ChIP-seq experiments for DNA-binding proteins , 2008, Nature Biotechnology.

[2]  U. Grossniklaus,et al.  The Arabidopsis thaliana MEDEA Polycomb group protein controls expression of PHERES1 by parental imprinting , 2005, Nature Genetics.

[3]  David A. Nix,et al.  Empirical methods for controlling false positives and estimating confidence in ChIP-Seq peaks , 2008, BMC Bioinformatics.

[4]  A. Barski,et al.  Genomic location analysis by ChIP‐Seq , 2009, Journal of cellular biochemistry.

[5]  R. Fischer,et al.  Imprinting and Seed Development , 2004, The Plant Cell Online.

[6]  John A. Hamilton,et al.  The TIGR Rice Genome Annotation Resource: improvements and new features , 2006, Nucleic Acids Res..

[7]  Vincent Colot,et al.  Profiling histone modification patterns in plants using genomic tiling microarrays , 2005, Nature Methods.

[8]  Zhou Du,et al.  agriGO: a GO analysis toolkit for the agricultural community , 2010, Nucleic Acids Res..

[9]  Takuji Sasaki,et al.  The map-based sequence of the rice genome , 2005, Nature.

[10]  Raymond K. Auerbach,et al.  PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls , 2009, Nature Biotechnology.

[11]  S. Lewis,et al.  The generic genome browser: a building block for a model organism system database. , 2002, Genome research.

[12]  M. Bauer,et al.  Endosperm gene imprinting and seed development. , 2007, Current opinion in genetics & development.

[13]  U. Grossniklaus,et al.  Different Polycomb group complexes regulate common target genes in Arabidopsis , 2006, EMBO reports.

[14]  Clifford A. Meyer,et al.  Model-based Analysis of ChIP-Seq (MACS) , 2008, Genome Biology.

[15]  Athanasios Papoulis,et al.  Probability, Random Variables and Stochastic Processes , 1965 .

[16]  William Feller,et al.  An Introduction to Probability Theory and Its Applications , 1951 .

[17]  U. Grossniklaus,et al.  Dynamic regulatory interactions of Polycomb group genes: MEDEA autoregulation is required for imprinted gene expression in Arabidopsis. , 2006, Genes & development.

[18]  Dustin E. Schones,et al.  High-Resolution Profiling of Histone Methylations in the Human Genome , 2007, Cell.

[19]  Terrence S. Furey,et al.  F-Seq: a feature density estimator for high-throughput sequence tags , 2008, Bioinform..

[20]  B. Steensel,et al.  Genome-wide profiling of PRC1 and PRC2 Polycomb chromatin binding in Drosophila melanogaster , 2006, Nature Genetics.

[21]  Wing Hung Wong,et al.  SeqMap: mapping massive amount of oligonucleotides to the genome , 2008, Bioinform..

[22]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[23]  Jon Penterman,et al.  DEMETER DNA Glycosylase Establishes MEDEA Polycomb Gene Self-Imprinting by Allele-Specific Demethylation , 2006, Cell.

[24]  Philipp Bucher,et al.  ChIP-Seq Data Reveal Nucleosome Architecture of Human Promoters , 2007, Cell.

[25]  Y. Benjamini,et al.  THE CONTROL OF THE FALSE DISCOVERY RATE IN MULTIPLE TESTING UNDER DEPENDENCY , 2001 .

[26]  A. Mortazavi,et al.  Computation for ChIP-seq and RNA-seq studies , 2009, Nature Methods.

[27]  Megan F. Cole,et al.  Control of Developmental Regulators by Polycomb in Human Embryonic Stem Cells , 2006, Cell.

[28]  Ranjan Roy Sources in the Development of Mathematics: The Hypergeometric Series , 2011 .

[29]  Y. Qi,et al.  Global Epigenetic and Transcriptional Trends among Two Rice Subspecies and Their Reciprocal Hybrids[W] , 2010, Plant Cell.

[30]  J. Zeitlinger,et al.  Polycomb complexes repress developmental regulators in murine embryonic stem cells , 2006, Nature.

[31]  Nir Ohad,et al.  Polycomb Group Complexes Self-Regulate Imprinting of the Polycomb Group Gene MEDEA in Arabidopsis , 2006, Current Biology.

[32]  Matteo Pellegrini,et al.  Whole-Genome Analysis of Histone H3 Lysine 27 Trimethylation in Arabidopsis , 2007, PLoS biology.

[33]  Jun S. Song,et al.  Identifying Positioned Nucleosomes with Epigenetic Marks in Human from ChIP-Seq , 2008, BMC Genomics.

[34]  Steven J. M. Jones,et al.  FindPeaks 3.1: a tool for identifying areas of enrichment from massively parallel short-read sequencing technology , 2008, Bioinform..

[35]  U. Grossniklaus,et al.  Polycomb group and trithorax group proteins in Arabidopsis. , 2007, Biochimica et biophysica acta.