Predicting gene expression from genome wide protein binding profiles

Abstract High-throughput technologies such as chromatin immunoprecipitation (IP) followed by next generation sequencing (ChIP-seq) in combination with gene expression studies have enabled researchers to investigate relationships between the distribution of chromosome-associated proteins and the regulation of gene transcription on a genome-wide scale. Several attempts at integrative analyses have identified direct relationships between the two processes. However, a comprehensive understanding of the regulatory events remains elusive. This is in part due to the scarcity of robust analytical methods for the detection of binding regions from ChIP-seq data. In this paper, we have applied a recently proposed Markov random field model for the detection of enriched binding regions under different biological conditions and time points. The method accounts for spatial dependencies and IP efficiencies, which can vary significantly between different experiments. We further defined the enriched chromosomal binding regions as distinct genomic features, such as promoter, exon, intron, and distal intergenic, and then investigated how predictive each of these features are of gene expression activity using machine learning techniques, including neural networks, decision trees and random forest. The analysis of a ChIP-seq time-series dataset comprising six protein markers and associated microarray data, obtained from the same biological samples, shows promising results and identified biologically plausible relationships between the protein profiles and gene regulation.

[1]  Gordon K Smyth,et al.  Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments , 2004, Statistical applications in genetics and molecular biology.

[2]  Simon Tavaré,et al.  beadarray: R classes and methods for Illumina bead-based data , 2007, Bioinform..

[3]  Wei Xiong,et al.  Integrating ChIP-sequencing and digital gene expression profiling to identify BRD7 downstream genes and construct their regulating network , 2015, Molecular and Cellular Biochemistry.

[4]  Christopher J. Nelson,et al.  Advantages of next-generation sequencing versus the microarray in epigenetic research. , 2009, Briefings in functional genomics & proteomics.

[5]  B. Steensel Mapping of genetic and epigenetic regulatory networks using microarrays , 2005, Nature Genetics.

[6]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[7]  C. Rice,et al.  Suppression of inflammation by a synthetic histone mimic , 2010, Nature.

[8]  Steven Henikoff,et al.  Histone variants on the move: substrates for chromatin dynamics , 2016, Nature Reviews Molecular Cell Biology.

[9]  Jurg Ott,et al.  Distribution and characterization of regulatory elements in the human genome. , 2002, Genome research.

[10]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[11]  Ernst Wit,et al.  Joint modeling of ChIP-seq data via a Markov random field model. , 2013, Biostatistics.

[12]  M. Moore,et al.  A quantitative analysis of intron effects on mammalian gene expression. , 2003, RNA.

[13]  Steven Hahn,et al.  Structure and mechanism of the RNA polymerase II transcription machinery , 2004, Nature Structural &Molecular Biology.

[14]  Qing-Yu He,et al.  ChIPseeker: an R/Bioconductor package for ChIP peak annotation, comparison and visualization , 2015, Bioinform..

[15]  Mathisca C. M. de Gunst,et al.  Identification of context-specific gene regulatory networks with GEMULA - gene expression modeling using LAsso , 2012, Bioinform..

[16]  Bin Yan,et al.  PTHGRN: unraveling post-translational hierarchical gene regulatory networks using PPI, ChIP-seq and gene expression data , 2014, Nucleic Acids Res..

[17]  Edoardo M. Airoldi,et al.  Mapping Dynamic Histone Acetylation Patterns to Gene Expression in Nanog-Depleted Murine Embryonic Stem Cells , 2010, PLoS Comput. Biol..

[18]  S. Burley,et al.  RNA polymerase II transcription initiation: a structural view. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Alex T. Kalinka,et al.  Introns and gene expression: Cellular constraints, transcriptional regulation, and evolutionary consequences , 2014, BioEssays : news and reviews in molecular, cellular and developmental biology.

[20]  Gordon K. Smyth,et al.  limma: Linear Models for Microarray Data , 2005 .

[21]  Michael Q. Zhang,et al.  ChIP-Array: combinatory analysis of ChIP-seq/chip and microarray gene expression data to discover direct/indirect targets of a transcription factor , 2011, Nucleic Acids Res..

[22]  S. Hannenhalli,et al.  Conservation in first introns is positively associated with the number of exons within genes and the presence of regulatory epigenetic signals , 2014, BMC Genomics.