A Hybridized Feature Selection and Extraction Approach for Enhancing Cancer Prediction Based on DNA Methylation

Due to the vital role of the aberrant DNA methylation during the disease development such as cancer, the comprehension of its mechanism had become essential in the recent years for early detection and diagnosis. With the advent of the high-throughput technologies, there are still several challenges to achieve the classification process using the DNA methylation data. The high-dimensionality and high-noisiness of the DNA methylation data may lead to the degradation of the prediction accuracy. Thus, it becomes increasingly important in a wide range to employ robust computational tools such as feature selection and extraction methods to extract the informative features amongst thousands of them, and hence improving cancer prediction. By using the DNA methylation degree in promoters and probes regions, this paper aims at predicting cancer with a hybridized approach based on the feature selection and feature extraction techniques. The suggested approach exploits a filter feature selection method called (F-score) to overcome the high-dimensionality problem of the DNA methylation data, and proposes an extraction model which employs the peaks of the mean methylation density, the fast Fourier transform algorithm, and the symmetry between the methylation density of a sample and the mean methylation density of both sample types normal and cancer as novel feature extraction methods, in order to accurate cancer classification and reduce training time. To evaluate the reliability of our approach, The naïve base, random forest, and support vector machine algorithms are introduced to predict different cancer types like: breast, colon, head, kidney, lung, thyroid, and uterine with and without the hybridized approach. The results show that, the classification accuracy improves in all most cases and it also proves the reliability indirectly.

[1]  Chaoli Wang,et al.  Information Theory in Scientific Visualization , 2011, Entropy.

[2]  Serdar Bozdag,et al.  A Feature Selection Algorithm to Compute Gene Centric Methylation from Probe Level Methylation Data , 2016, PloS one.

[3]  Yong Wang,et al.  iPcc: a novel feature extraction method for accurate disease class discovery and prediction , 2013, Nucleic acids research.

[4]  Thomas Lengauer,et al.  Comprehensive Analysis of DNA Methylation Data with RnBeads , 2014, Nature Methods.

[5]  Francine Durocher,et al.  The Role of Methylation in Breast Cancer Susceptibility and Treatment. , 2015, Anticancer research.

[6]  Robert M. Gray,et al.  Entropy and Information Theory -2/E. , 2014 .

[7]  Ina Fourie,et al.  Entropy and Information Theory (2nd ed.) , 2012 .

[8]  R. Santella,et al.  Epigenetic Biomarkers of Breast Cancer Risk: Across the Breast Cancer Prevention Continuum. , 2016, Advances in experimental medicine and biology.

[9]  M. Sivabalakrishnan,et al.  Feature Selection of Gene Expression Data for Cancer Classification: A Review , 2015 .

[11]  S. Sheather Density Estimation , 2004 .

[12]  Cuong Nguyen,et al.  Random forest classifier combined with feature selection for breast cancer diagnosis and prognostic , 2013 .

[13]  Xiaodong Zhao,et al.  Identification of Biomarkers for Predicting Lymph Node Metastasis of Stomach Cancer Using Clinical DNA Methylation Data , 2017, Disease markers.

[14]  Duncan Fyfe Gillies,et al.  A Review of Feature Selection and Feature Extraction Methods Applied on Microarray Data , 2015, Adv. Bioinformatics.

[15]  X. Wang,et al.  An Optimal Mean Based Block Robust Feature Extraction Method to Identify Colorectal Cancer Genes with Integrated Data , 2017, Scientific Reports.

[16]  P. Jaganathan,et al.  A Comparative Study of Improved F-Score with Support Vector Machine and RBF Network for Breast Cancer Classification , 2012 .