A comprehensive review of computational prediction of genome-wide features.

There are significant correlations among different types of genetic, genomic and epigenomic features within the genome. These correlations make the in silico feature prediction possible through statistical or machine learning models. With the accumulation of a vast amount of high-throughput data, feature prediction has gained significant interest lately, and a plethora of papers have been published in the past few years. Here we provide a comprehensive review on these published works, categorized by the prediction targets, including protein binding site, enhancer, DNA methylation, chromatin structure and gene expression. We also provide discussions on some important points and possible future directions.

[1]  Lee E. Edsall,et al.  Human DNA methylomes at base resolution show widespread epigenomic differences , 2009, Nature.

[2]  Alexander R. Pico,et al.  Dynamic and Coordinated Epigenetic Regulation of Developmental Transitions in the Cardiac Lineage , 2012, Cell.

[3]  Morteza Mohammad Noori,et al.  Enhanced Regulatory Sequence Prediction Using Gapped k-mer Features , 2014, PLoS Comput. Biol..

[4]  Han Xu,et al.  Analysis of optimized DNase-seq reveals intrinsic bias in transcription factor footprint identification , 2013, Nature methods.

[5]  Jaie C. Woodard,et al.  Survey of variation in human transcription factors reveals prevalent DNA binding changes , 2016, Science.

[6]  J. Kawai,et al.  Cap analysis gene expression for high-throughput analysis of transcriptional starting point and identification of promoter usage , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Michael Q. Zhang,et al.  Computational prediction of methylation status in human genomic sequences. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[8]  B. Ren,et al.  Genome-wide prediction of transcription factor binding sites using an integrated model , 2010, Genome Biology.

[9]  Zhaohui S. Qin,et al.  Base-resolution methylation patterns accurately predict transcription factor bindings in vivo , 2015, Nucleic acids research.

[10]  Zachary D. Smith,et al.  DNA methylation: roles in mammalian development , 2013, Nature Reviews Genetics.

[11]  Michael Fernández,et al.  Genome-wide enhancer prediction from epigenetic signatures using genetic algorithm-optimized support vector machines , 2012, Nucleic acids research.

[12]  W. Wong,et al.  ChIP-Seq of transcription factors predicts absolute and differential gene expression in embryonic stem cells , 2009, Proceedings of the National Academy of Sciences.

[13]  Michael Q. Zhang,et al.  Bioinformatics Original Paper Predicting Methylation Status of Cpg Islands in the Human Brain , 2022 .

[14]  Wyeth W. Wasserman,et al.  JASPAR: an open-access database for eukaryotic transcription factor binding profiles , 2004, Nucleic Acids Res..

[15]  A. Visel,et al.  Large-Scale Discovery of Enhancers from Human Heart Tissue , 2011, Nature Genetics.

[16]  Ren Long,et al.  iEnhancer-2L: a two-layer predictor for identifying enhancers and their strength by pseudo k-tuple nucleotide composition , 2016, Bioinform..

[17]  Stephen A. Ramsey,et al.  Genome-wide histone acetylation data improve prediction of mammalian transcription factor binding sites , 2010, Bioinform..

[18]  O. Stegle,et al.  DeepCpG: accurate prediction of single-cell DNA methylation states using deep learning , 2016, Genome Biology.

[19]  Alexander E. Kel,et al.  TRANSFAC®: transcriptional regulation, from patterns to profiles , 2003, Nucleic Acids Res..

[20]  Hongkai Ji,et al.  Differential principal component analysis of ChIP-seq , 2013, Proceedings of the National Academy of Sciences.

[21]  Myong-Hee Sung,et al.  DNase footprint signatures are dictated by factor dynamics and DNA sequence. , 2014, Molecular cell.

[22]  Kevin Y. Yip,et al.  A statistical framework for modeling gene expression using chromatin features and application to modENCODE datasets , 2011, Genome Biology.

[23]  Philip A. Ewels,et al.  Mapping long-range promoter contacts in human cells with high-resolution capture Hi-C , 2015, Nature Genetics.

[24]  T. Mikkelsen,et al.  The NIH Roadmap Epigenomics Mapping Consortium , 2010, Nature Biotechnology.

[25]  Tatsunori B. Hashimoto,et al.  Discovery of non-directional and directional pioneer transcription factors by modeling DNase profile magnitude and shape , 2014, Nature Biotechnology.

[26]  Michael Q. Zhang,et al.  Integrative analysis of 111 reference human epigenomes , 2015, Nature.

[27]  Ivan Ovcharenko,et al.  CLARE: Cracking the LAnguage of Regulatory Elements , 2012, Bioinform..

[28]  Julia A. Lasserre,et al.  Histone modification levels are predictive for gene expression , 2010, Proceedings of the National Academy of Sciences.

[29]  A. Gnirke,et al.  Reduced representation bisulfite sequencing for comparative high-resolution DNA methylation analysis , 2005, Nucleic acids research.

[30]  V. Bajic,et al.  DEEP: a general computational framework for predicting enhancers , 2014, Nucleic acids research.

[31]  S. Sinha,et al.  Quantitative modeling of gene expression using DNA shape features of binding sites , 2016, Nucleic acids research.

[32]  Laura E. DeMare,et al.  Chromatin state signatures associated with tissue-specific gene expression and enhancer activity in the embryonic limb , 2012, Genome research.

[33]  Yan Li,et al.  A high-resolution map of three-dimensional chromatin interactome in human cells , 2013, Nature.

[34]  Dustin E. Schones,et al.  Dynamic Regulation of Nucleosome Positioning in the Human Genome , 2008, Cell.

[35]  B. Frey,et al.  Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning , 2015, Nature Biotechnology.

[36]  Ivan G. Costa,et al.  Detection of active transcription factor binding sites with the combination of DNase hypersensitivity and histone modifications , 2014, Bioinform..

[37]  D. Dickel,et al.  Improved regulatory element prediction based on tissue-specific local epigenomic signatures , 2017, Proceedings of the National Academy of Sciences.

[38]  Michael Q. Zhang,et al.  Large-scale structure of genomic methylation patterns. , 2005, Genome research.

[39]  David J. Arenillas,et al.  The PAZAR database of gene regulatory information coupled to the ORCA toolkit for the study of regulatory sequences , 2008, Nucleic Acids Res..

[40]  Yiming Lu,et al.  DELTA: A Distal Enhancer Locating Tool Based on AdaBoost Algorithm and Shape Features of Chromatin Modifications , 2015, PloS one.

[41]  Kai Tan,et al.  Discover regulatory DNA elements using chromatin signatures and artificial neural network , 2010, Bioinform..

[42]  D. Gifford,et al.  Predicting the impact of non-coding variants on DNA methylation , 2016, bioRxiv.

[43]  K. Hansen,et al.  Reconstructing A/B compartments as revealed by Hi-C using long-range correlations in epigenetic data , 2015, Genome Biology.

[44]  M. Gerstein,et al.  Modeling the relative relationship of transcription factor binding and histone modifications to gene expression levels in mouse embryonic stem cells , 2011, Nucleic acids research.

[45]  Hongkai Ji,et al.  Dynamic motif occupancy (DynaMO) analysis identifies transcription factors and their binding sites driving dynamic biological processes , 2017, Nucleic acids research.

[46]  A. Schulze,et al.  Navigating gene expression using microarrays — a technology review , 2001, Nature Cell Biology.

[47]  Nathaniel D. Heintzman,et al.  Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome , 2007, Nature Genetics.

[48]  A. Visel,et al.  ChIP-Seq identification of weakly conserved heart enhancers , 2010, Nature Genetics.

[49]  William Stafford Noble,et al.  Sequence and chromatin determinants of cell-type–specific transcription factor binding , 2012, Genome research.

[50]  E. Furlong,et al.  Combinatorial binding predicts spatio-temporal cis-regulatory activity , 2009, Nature.

[51]  J. Han,et al.  Inferring causal relationships among different histone modifications and gene expression. , 2008, Genome research.

[52]  C. Kai,et al.  CAGE: cap analysis of gene expression , 2006, Nature Methods.

[53]  T. Bailey,et al.  High-throughput chromatin information enables accurate tissue-specific prediction of transcription factor binding sites , 2008, Nucleic acids research.

[54]  Kaanan P. Shah,et al.  A gene-based association method for mapping traits using reference transcriptome data , 2015, Nature Genetics.

[55]  A. McKenna,et al.  Absolute quantification of somatic DNA alterations in human cancer , 2012, Nature Biotechnology.

[56]  Kenta Nakai,et al.  A regression analysis of gene expression in ES cells reveals two gene classes that are significantly different in epigenetic patterns , 2011, BMC Bioinformatics.

[57]  E. Marco,et al.  Predicting chromatin organization using histone marks , 2015, Genome Biology.

[58]  R. Young,et al.  Histone H3K27ac separates active from poised enhancers and predicts developmental state , 2010, Proceedings of the National Academy of Sciences.

[59]  Dong Xu,et al.  Predicting DNA Methylation State of CpG Dinucleotide Using Genome Topological Features and Deep Networks , 2016, Scientific Reports.

[60]  Wei Wang,et al.  Predicting the Human Epigenome from DNA Motifs , 2014, Nature Methods.

[61]  Guido Sanguinetti,et al.  Higher order methylation features for clustering and prediction in epigenomic studies , 2016, Bioinform..

[62]  P. Scacheri,et al.  Epigenetic signatures distinguish multiple classes of enhancers with distinct cellular functions. , 2011, Genome research.

[63]  Kevin Y. Yip,et al.  Classification of human genomic regions based on experimentally determined binding sites of more than 100 transcription-related factors , 2012, Genome Biology.

[64]  Francisco de A. T. de Carvalho,et al.  Predicting gene expression in T cell differentiation from histone modifications and transcription factor binding affinities by linear mixture models , 2011, BMC Bioinformatics.

[65]  Jie Wang,et al.  Factorbook.org: a Wiki-based database for transcription factor-binding data generated by the ENCODE consortium , 2012, Nucleic Acids Res..

[66]  Éric Renault,et al.  MethDB - a public database for DNA methylation data , 2001, Nucleic Acids Res..

[67]  Peter A. Jones,et al.  The Role of DNA Methylation in Mammalian Epigenetics , 2001, Science.

[68]  Atif Shahab,et al.  Fusion transcripts and transcribed retrotransposed loci discovered through comprehensive transcriptome analysis using Paired-End diTags (PETs). , 2007, Genome research.

[69]  J. Ragoussis,et al.  Identification and characterization of enhancers controlling the inflammatory gene expression program in macrophages. , 2010, Immunity.

[70]  T. Spector,et al.  Predicting genome-wide DNA methylation using methylation marks, genomic position, and DNA regulatory elements , 2013, Genome Biology.

[71]  Shane C. Dillon,et al.  The landscape of histone modifications across 1% of the human genome in five human cell lines. , 2007, Genome research.

[72]  B. Williams,et al.  Mapping and quantifying mammalian transcriptomes by RNA-Seq , 2008, Nature Methods.

[73]  J. Dekker,et al.  Capturing Chromosome Conformation , 2002, Science.

[74]  Aibin He,et al.  Co-occupancy by multiple cardiac transcription factors identifies transcriptional enhancers active in heart , 2011, Proceedings of the National Academy of Sciences.

[75]  Zhaohui S. Qin,et al.  Statistical Challenges in Analyzing Methylation and Long-Range Chromosomal Interaction Data , 2016, Statistics in Biosciences.

[76]  Obi L. Griffith,et al.  ORegAnno: an open-access community-driven resource for regulatory annotation , 2007, Nucleic Acids Res..

[77]  Kevin Y. Yip,et al.  Understanding transcriptional regulation by integrative analysis of transcription factor binding data , 2012, Genome research.

[78]  S. Baylin,et al.  DNA methylation and gene silencing in cancer , 2005, Nature Clinical Practice Oncology.

[79]  Jacob F. Degner,et al.  Sequence and Chromatin Accessibility Data Accurate Inference of Transcription Factor Binding from Dna Material Supplemental Open Access , 2022 .

[80]  Manoj Bhasin,et al.  Prediction of methylated CpGs in DNA sequences using a support vector machine , 2005, FEBS letters.

[81]  Kairong Cui,et al.  H3.3/H2A.Z double variant-containing nucleosomes mark ‘nucleosome-free regions’ of active promoters and other regulatory regions in the human genome , 2009, Nature Genetics.

[82]  William Stafford Noble,et al.  Epigenetic priors for identifying active transcription factor binding sites , 2012, Bioinform..

[83]  William Stafford Noble,et al.  DNA sequence+shape kernel enables alignment-free modeling of transcription factor binding , 2016, bioRxiv.

[84]  Feng Liu,et al.  PEDLA: predicting enhancers with a deep learning-based algorithmic framework , 2016, Scientific Reports.

[85]  Clifford A. Meyer,et al.  Nucleosome Dynamics Define Transcriptional Enhancers , 2010, Nature Genetics.

[86]  Katherine S. Pollard,et al.  Integrating Diverse Datasets Improves Developmental Enhancer Prediction , 2013, PLoS Comput. Biol..

[87]  Nathan C. Sheffield,et al.  Predicting cell-type–specific gene expression from regions of open chromatin , 2012, Genome research.

[88]  Data production leads,et al.  An integrated encyclopedia of DNA elements in the human genome , 2012 .

[89]  Hao Wu,et al.  Estimating and accounting for tumor purity in the analysis of DNA methylation data from cancer studies , 2017, Genome Biology.

[90]  T. Meehan,et al.  An atlas of active enhancers across human cell types and tissues , 2014, Nature.

[91]  Sean Thomas,et al.  A Temporal Chromatin Signature in Human Embryonic Stem Cells Identifies Regulators of Cardiac Development , 2012, Cell.

[92]  Terrence S. Furey,et al.  DeFCoM: analysis and modeling of transcription factor binding sites using a motif‐centric genomic footprinter , 2016, Bioinform..

[93]  Wei Xie,et al.  RFECS: A Random-Forest Based Algorithm for Enhancer Identification from Chromatin State , 2013, PLoS Comput. Biol..

[94]  Wei Wang,et al.  Predicting CpG methylation levels by integrating Infinium HumanMethylation450 BeadChip array data. , 2016, Genomics.

[95]  Sheng Liu,et al.  Assessing the model transferability for prediction of transcription factor binding sites based on chromatin accessibility , 2017, BMC Bioinformatics.

[96]  I. Amit,et al.  Comprehensive mapping of long range interactions reveals folding principles of the human genome , 2011 .

[97]  Nicholas Carriero,et al.  Mocap: large-scale inference of transcription factor binding sites from chromatin accessibility , 2016, bioRxiv.

[98]  Wei Wang,et al.  Constructing 3D interaction maps from 1D epigenomes , 2016, Nature Communications.

[99]  Davide Marenduzzo,et al.  Predicting the three-dimensional folding of cis-regulatory regions in mammalian genomes using bioinformatic data and polymer models , 2016, Genome Biology.

[100]  Peter A. Jones,et al.  DNA methylation and cancer. , 1986, Progress in drug research. Fortschritte der Arzneimittelforschung. Progres des recherches pharmaceutiques.

[101]  Daniel Quang,et al.  FactorNet: a deep learning framework for predicting cell type specific transcription factor binding from nucleotide-resolution sequential data , 2017, bioRxiv.

[102]  E. Furlong,et al.  Tissue-specific analysis of chromatin state identifies temporal signatures of enhancer activity during embryonic development , 2012, Nature Genetics.

[103]  Cangzhi Jia,et al.  EnhancerPred: a predictor for discovering enhancers based on the combination and selection of multiple features , 2016, Scientific Reports.

[104]  Jerzy Tiuryn,et al.  Romulus: robust multi-state identification of transcription factor binding sites from DNase-seq data , 2016, Bioinform..

[105]  Ryan A. Flynn,et al.  A unique chromatin signature uncovers early developmental enhancers in humans , 2011, Nature.

[106]  Stephen C. J. Parker,et al.  BoostMe accurately predicts DNA methylation values in whole-genome bisulfite sequencing of multiple human tissues , 2017, BMC Genomics.

[107]  G. Hon,et al.  Base-Resolution Analysis of 5-Hydroxymethylcytosine in the Mammalian Genome , 2012, Cell.

[108]  A. Visel,et al.  ChIP-seq accurately predicts tissue-specific activity of enhancers , 2009, Nature.

[109]  Uwe Ohler,et al.  Explicit DNase sequence bias modeling enables high-resolution transcription factor footprint detection , 2014, Nucleic acids research.

[110]  E. Gusmão,et al.  Analysis of computational footprinting methods for DNase sequencing experiments , 2016, Nature Methods.

[111]  Michael A. Beer,et al.  Discriminative prediction of mammalian enhancers from DNA sequence. , 2011, Genome research.

[112]  Yanjun Qi,et al.  DeepChrome: deep-learning for predicting gene expression from histone modifications , 2016, Bioinform..

[113]  Fatemeh Zare-Mirakabad,et al.  Transcription Factor Binding Sites Prediction Based on Modified Nucleosomes , 2014, PloS one.

[114]  Lei Guo,et al.  Predicting Gene Expression from Sequence: A Reexamination , 2007, PLoS Comput. Biol..

[115]  Miguel A. Andrade-Navarro,et al.  Prediction of Chromatin Accessibility in Gene-Regulatory Regions from Transcriptomics Data , 2017, Scientific Reports.