Employing Genetic Algorithm to Construct Epigenetic Tree-Based Features for Enhancer Region Prediction

This paper presents a GA-based method to generate novel logical-based features, represented by parse trees, from DNA sequences enriched with H3K4me1 histone signatures. Current methods which mostly utilize k-mers content features are not able to represent the possible complex interaction of various DNA segments in H3K4me1 regions. We hypothesize that such complex interaction modeling is significant towards recognition of H3K4me1 marks. Our propose method employ the tree structure to model the logical relationship between k-mers from the marks. To benchmark our generated features, we compare it to the typically used k-mer content features using the mouse (mm9) genome dataset. Our results show that the logical rule features improve the performance in terms of f-measure for all the datasets tested.

[1]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[2]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[3]  Kenneth A. De Jong,et al.  An Evolutionary Algorithm Approach for Feature Generation from Sequence Data and Its Application to DNA Splice Site Prediction , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[4]  B. Oostra,et al.  A long-range Shh enhancer regulates expression in the developing limb and fin and is associated with preaxial polydactyly. , 2003, Human molecular genetics.

[5]  Ehsan Mesbahi,et al.  Cis-regulatory elements , 2010 .

[6]  Dustin E. Schones,et al.  High-Resolution Profiling of Histone Methylations in the Human Genome , 2007, Cell.

[7]  H. K. Dai,et al.  A survey of DNA motif finding algorithms , 2007, BMC Bioinformatics.

[8]  Michael A. Beer,et al.  Integration of ChIP-seq and machine learning reveals enhancers and a predictive regulatory sequence vocabulary in melanocytes , 2012, Genome research.

[9]  Tu Bao Ho,et al.  Prediction of Histone Modifications in DNA sequences , 2007, 2007 IEEE 7th International Symposium on BioInformatics and BioEngineering.

[10]  Nathaniel D. Heintzman,et al.  Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome , 2007, Nature Genetics.

[11]  Kai Tan,et al.  Discover regulatory DNA elements using chromatin signatures and artificial neural network , 2010, Bioinform..

[12]  Melanie Mitchell,et al.  An introduction to genetic algorithms , 1996 .

[13]  P. Wittkopp,et al.  Cis-regulatory elements: molecular mechanisms and evolutionary processes underlying divergence , 2011, Nature Reviews Genetics.