Cervical Cancer Identification with Synthetic Minority Oversampling Technique and PCA Analysis using Random Forest Classifier

Cervical cancer is the fourth most communal malignant disease amongst women worldwide. In maximum circumstances, cervical cancer indications are not perceptible at its initial stages. There are a proportion of features that intensify the threat of emerging cervical cancer like human papilloma virus, sexual transmitted diseases, and smoking. Ascertaining those features and constructing a classification model to categorize, if the cases are cervical cancer or not is an existing challenging research. This learning intentions at using cervical cancer risk features to build classification model using Random Forest (RF) classification technique with the synthetic minority oversampling technique (SMOTE) and two feature reduction techniques recursive feature elimination and principle component analysis (PCA). Utmost medical data sets are frequently imbalanced since the number of patients is considerably fewer than the number of non-patients. For the imbalance of the used data set, SMOTE is cast-off to solve this problem. The data set comprises of 32 risk factors and four objective variables: Hinselmann, Schiller, Cytology and Biopsy. Accuracy, Sensitivity, Specificity, PPA and NPA of the four variables remains accurate after SMOTE when compared with values obtained before SMOTE. An RSOnto ontology has been created to visualize the progress in classification performance.

[1]  K. Chou,et al.  iDNA-Prot: Identification of DNA Binding Proteins Using Random Forest with Grey Model , 2011, PloS one.

[2]  Adi Wijaya,et al.  Behavior Determinant Based Cervical Cancer Early Detection with Machine Learning Algorithm , 2016 .

[3]  El-Moselhy Ea,et al.  Cervical Cancer: Sociodemographic and Clinical Risk Factors among AdultEgyptian Females , 2016 .

[4]  Min Zhao,et al.  A risk evaluation model of cervical cancer based on etiology and human leukocyte antigen allele susceptibility. , 2014, International journal of infectious diseases : IJID : official publication of the International Society for Infectious Diseases.

[5]  Rebecca L. Siegel Mph,et al.  Cancer statistics, 2018 , 2018 .

[6]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[7]  R. Chatterjee,et al.  Awareness of cervical cancer among female students of premier colleges in Kolkata, India. , 2010, Asian Pacific journal of cancer prevention : APJCP.

[8]  R. Udendhran,et al.  A Novel Internet of Things Framework Integrated with Real Time Monitoring for Intelligent Healthcare Environment , 2019, Journal of Medical Systems.

[9]  E. Kannan,et al.  An efficient framework for heart disease classification using feature extraction and feature selection technique in data mining , 2016, 2016 International Conference on Emerging Trends in Engineering, Technology and Science (ICETETS).

[10]  Wen Wu,et al.  Data-Driven Diagnosis of Cervical Cancer With Support Vector Machine-Based Approaches , 2017, IEEE Access.

[11]  Chun Zhang,et al.  Feature selection of power system transient stability assessment based on random forest and recursive feature elimination , 2016, 2016 IEEE PES Asia-Pacific Power and Energy Engineering Conference (APPEEC).

[12]  Chee Peng Lim,et al.  A hybrid intelligent system for medical data classification , 2014, Expert Syst. Appl..

[13]  R. Udendhran,et al.  Hybridized neural network and decision tree based classifier for prognostic decision making in breast cancers , 2020, Soft Comput..

[14]  David A. Cieslak,et al.  Combating imbalance in network intrusion datasets , 2006, 2006 IEEE International Conference on Granular Computing.

[15]  Andreas Stolcke,et al.  A study in machine learning from imbalanced data for sentence boundary detection in speech , 2006, Comput. Speech Lang..

[16]  Gérard Biau,et al.  Analysis of a Random Forests Model , 2010, J. Mach. Learn. Res..

[17]  Chih-Jen Tseng,et al.  Application of machine learning to predict the recurrence-proneness for cervical cancer , 2013, Neural Computing and Applications.

[18]  Ramón Díaz-Uriarte,et al.  Gene selection and classification of microarray data using random forest , 2006, BMC Bioinformatics.

[19]  Shahram Jafari,et al.  An Expert System for Detection of Breast Cancer Using Data Preprocessing and Bayesian Network , 2011 .

[20]  A. Suresh,et al.  Predictive big data analytic on demonetization data using support vector machine , 2018, Cluster Computing.

[21]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[22]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[23]  L. Kalaivani,et al.  Collaborative approach on mitigating spectrum sensing data hijack attack and dynamic spectrum allocation based on CASG modeling in wireless cognitive radio networks , 2017, Cluster Computing.

[24]  A. Jemal,et al.  Cancer statistics, 2018 , 2018, CA: a cancer journal for clinicians.

[25]  Madasamy Kaliappan,et al.  Enhancing secure routing in Mobile Ad Hoc Networks using a Dynamic Bayesian Signalling Game model , 2015, Comput. Electr. Eng..

[26]  Susan Augustine,et al.  Enhancing energy efficiency and load balancing in mobile ad hoc network using dynamic genetic algorithms , 2016, J. Netw. Comput. Appl..

[27]  Madasamy Kaliappan,et al.  Development of a secure routing protocol using game theory model in mobile ad hoc networks , 2015, Journal of Communications and Networks.

[28]  Vijay Kotu,et al.  Predictive Analytics and Data Mining: Concepts and Practice with RapidMiner , 2014 .

[29]  L. Kalaivani,et al.  Development of secured data transmission using machine learning-based discrete-time partially observed Markov model and energy optimization in cognitive radio networks , 2018, Neural Computing and Applications.

[30]  P. Subbulakshmi,et al.  Optimization using Artificial Bee Colony based clustering approach for big data , 2018, Cluster Computing.

[31]  Robin Genuer,et al.  Random Forests: some methodological insights , 2008, 0811.3619.

[32]  P. Disaia,et al.  Colposcopy to evaluate abnormal cervical cytology in 2008. , 2009, American journal of obstetrics and gynecology.