Discover regulatory DNA elements using chromatin signatures and artificial neural network

MOTIVATION Recent large-scale chromatin states mapping efforts have revealed characteristic chromatin modification signatures for various types of functional DNA elements. Given the important influence of chromatin states on gene regulation and the rapid accumulation of genome-wide chromatin modification data, there is a pressing need for computational methods to analyze these data in order to identify functional DNA elements. However, existing computational tools do not exploit data transformation and feature extraction as a means to achieve a more accurate prediction. RESULTS We introduce a new computational framework for identifying functional DNA elements using chromatin signatures. The framework consists of a data transformation and a feature extraction step followed by a classification step using time-delay neural network. We implemented our framework in a software tool CSI-ANN (chromatin signature identification by artificial neural network). When applied to predict transcriptional enhancers in the ENCODE region, CSI-ANN achieved a 65.5% sensitivity and 66.3% positive predictive value, a 5.9% and 11.6% improvement, respectively, over the previously best approach. AVAILABILITY AND IMPLEMENTATION CSI-ANN is implemented in Matlab. The source code is freely available at http://www.medicine.uiowa.edu/Labs/tan/CSIANNsoft.zip CONTACT kai-tan@uiowa.edu SUPPLEMENTARY INFORMATION Supplementary Materials are available at Bioinformatics online.

[1]  Dustin E. Schones,et al.  Genome-wide approaches to studying chromatin modifications , 2008, Nature Reviews Genetics.

[2]  Brian Litt,et al.  Epileptic seizure prediction using hybrid feature selection over multiple intracranial EEG electrode contacts: a report of four patients , 2003, IEEE Transactions on Biomedical Engineering.

[3]  Francesca Chiaromonte,et al.  Evaluation of regulatory potential and conservation scores for detecting cis-regulatory modules in aligned mammalian genome sequences. , 2005, Genome research.

[4]  S. Hyakin,et al.  Neural Networks: A Comprehensive Foundation , 1994 .

[5]  P. Kuchel,et al.  Theoretical and practical aspects of NMR studies of cells. , 1994, ImmunoMethods.

[6]  Joshua M. Stuart,et al.  MICROARRAY EXPERIMENTS : APPLICATION TO SPORULATION TIME SERIES , 1999 .

[7]  References , 1971 .

[8]  J. Rinn,et al.  Many human large intergenic noncoding RNAs associate with chromatin-modifying complexes and affect gene expression , 2009, Proceedings of the National Academy of Sciences.

[9]  William Stafford Noble,et al.  Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project , 2007, Nature.

[10]  C. Allis,et al.  The language of covalent histone modifications , 2000, Nature.

[11]  Dustin E. Schones,et al.  Genome-wide Mapping of HATs and HDACs Reveals Distinct Functions in Active and Inactive Genes , 2009, Cell.

[12]  Nathaniel D Heintzman,et al.  Finding distal regulatory elements in the human genome. , 2009, Current opinion in genetics & development.

[13]  Michael Q. Zhang,et al.  Combinatorial patterns of histone acetylations and methylations in the human genome , 2008, Nature Genetics.

[14]  Nathaniel D. Heintzman,et al.  Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome , 2007, Nature Genetics.

[15]  T. Kailath The Divergence and Bhattacharyya Distance Measures in Signal Selection , 1967 .

[16]  Mathieu Blanchette,et al.  PReMod: a database of genome-wide mammalian cis-regulatory module predictions , 2006, Nucleic Acids Res..

[17]  Martin C. Frith,et al.  Cluster-Buster: finding dense clusters of motifs in DNA sequences , 2003, Nucleic Acids Res..

[18]  Bing Ren,et al.  Prediction of regulatory elements in mammalian genomes using chromatin signatures , 2008, BMC Bioinformatics.

[19]  Bing Ren,et al.  ChromaSig: A Probabilistic Approach to Finding Common Chromatin Signatures in the Human Genome , 2008, PLoS Comput. Biol..

[20]  P. Park ChIP–seq: advantages and challenges of a maturing technology , 2009, Nature Reviews Genetics.

[21]  Ivan Ovcharenko,et al.  Predicting tissue-specific enhancers in the human genome. , 2006, Genome research.

[22]  S. Batalov,et al.  A gene atlas of the mouse and human protein-encoding transcriptomes. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[23]  Jun S. Song,et al.  Identifying Positioned Nucleosomes with Epigenetic Marks in Human from ChIP-Seq , 2008, BMC Genomics.

[24]  F. Robert,et al.  Genome-wide computational prediction of transcriptional regulatory modules reveals new insights into human gene expression , 2006 .

[25]  David G. Stork,et al.  Pattern Classification , 1973 .

[26]  William Stafford Noble,et al.  Identification of higher-order functional domains in the human ENCODE regions. , 2007, Genome research.

[27]  A. Visel,et al.  Ultraconservation identifies a small subset of extremely constrained developmental enhancers , 2008, Nature Genetics.

[28]  George J. Vachtsevanos,et al.  Genetic programming of conventional features to detect seizure precursors , 2007, Eng. Appl. Artif. Intell..

[29]  Tae Hoon Kim,et al.  Genome-wide analysis of protein-DNA interactions. , 2006, Annual review of genomics and human genetics.

[30]  Mauro Birattari,et al.  Swarm Intelligence , 2012, Lecture Notes in Computer Science.

[31]  Z. Weng,et al.  High-Resolution Mapping and Characterization of Open Chromatin across the Genome , 2008, Cell.

[32]  Saurabh Sinha,et al.  A probabilistic method to detect regulatory modules , 2003, ISMB.

[33]  Nathaniel D. Heintzman,et al.  Histone modifications at human enhancers reflect global cell-type-specific gene expression , 2009, Nature.

[34]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..