Evaluation of machine learning based optimized feature selection approaches and classification methods for cervical cancer prediction

Cervical cancer is one type of gynaecological cancers and the majority of these complications of cervical cancer are associated to human papillomavirus infection. There are numerous risk factors associated with cervical cancer. It is important to recognize the significance of test variables of cervical cancer for categorizing the patients based on the results. This work intended to attain deeper understanding by applying machine learning techniques in R to analyze the risk factors of cervical cancer. Various types of feature selection techniques are explored in this work to determine about important attributes for cervical cancer prediction. Significant features are identified over various iterations of model training through several feature selection methods and an optimized feature selection model has been formed. In addition, this work aimed to build few classifier models using C5.0, random forest, rpart, KNN and SVM algorithms. Maximum possibilities were explored for training and performance evaluation of all the models. The performance and prediction exactness of these algorithms are conferred in this paper based on the outcomes attained. Overall, C5.0 and random forest classifiers have performed reasonably well with comprehensive accuracy for identifying women exhibiting clinical sign of cervical cancer.

[1]  Cuong Nguyen,et al.  Random forest classifier combined with feature selection for breast cancer diagnosis and prognostic , 2013 .

[2]  Fahima A. Maghraby,et al.  Cervical Cancer Diagnosis Using Random Forest Classifier With SMOTE and Feature Reduction Techniques , 2018, IEEE Access.

[3]  Rutvija Pandya,et al.  C5.0 Algorithm to Improved Decision Tree with Feature Selection and Reduced Error Pruning , 2015 .

[4]  Daehan Won,et al.  Classification of Cervical Cancer Dataset , 2018, ArXiv.

[5]  Ross Jacobucci Decision tree stability and its effect on interpretation. , 2018 .

[6]  Dimitrios I. Fotiadis,et al.  Machine learning applications in cancer prognosis and prediction , 2014, Computational and structural biotechnology journal.

[7]  Bernd Bischl,et al.  mlr: Machine Learning in R , 2016, J. Mach. Learn. Res..

[8]  Le Thi Hoai An,et al.  Feature selection in machine learning: an exact penalty approach using a Difference of Convex function Algorithm , 2014, Machine Learning.

[9]  Keun Ho Ryu,et al.  A Hybrid Feature Selection Method to Classification and Its Application in Hypertension Diagnosis , 2017, ITBAM.

[10]  K. Usha Rani,et al.  ANALYSIS OF FEATURE SELECTION WITH CLASSFICATION: BREAST CANCER DATASETS , 2011 .

[11]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[12]  Jesús S. Aguilar-Ruiz,et al.  Incremental wrapper-based gene selection from microarray data for cancer classification , 2006, Pattern Recognit..

[13]  D. Asir Antony Gnana Singh,et al.  Literature Review on Feature Selection Methods for High-Dimensional Data , 2016 .

[14]  Farzad Hadaegh,et al.  A tutorial on variable selection for clinical prediction models: feature selection methods in data mining could improve the results. , 2016, Journal of clinical epidemiology.

[15]  Veena Vasudevan,et al.  Feature selection in clinical data processing for classification , 2016, 2016 International Conference on Information Science (ICIS).

[16]  Wieslaw Paja,et al.  All Relevant Feature Selection Methods and Applications , 2015, Feature Selection for Data and Pattern Recognition.

[17]  Ivo D. Dinov Improving Model Performance , 2018 .

[18]  Kemal Akyol,et al.  A Study on Test Variable Selection and Balanced Data for Cervical Cancer Disease , 2018, International Journal of Information Engineering and Electronic Business.

[19]  Jean-Michel Poggi,et al.  VSURF: An R Package for Variable Selection Using Random Forests , 2015, R J..

[20]  Asha Gowda Karegowda,et al.  Feature Subset Selection Problem using Wrapper Approach in Supervised Learning , 2010 .

[21]  Abid Sarwar,et al.  Performance evaluation of machine learning techniques for screening of cervical cancer , 2015, 2015 2nd International Conference on Computing for Sustainable Global Development (INDIACom).

[22]  Vineet Menon,et al.  Machine Learning Applied to Cervical Cancer Data , 2019, International Journal of Mathematical Sciences and Computing.

[23]  Nikola Bogunovic,et al.  A review of feature selection methods with applications , 2015, 2015 38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO).