Cervical Cancer Prediction through Different Screening Methods using Data Mining

Cervical cancer remains an important reason of deaths worldwide because effective access to cervical screening methods is a big challenge. Data mining techniques including decision tree algorithms are used in biomedical research for predictive analysis. The imbalanced dataset was obtained from the dataset archive belongs to the University of California, Irvine. Synthetic Minority Oversampling Technique (SMOTE) has been used to balance the dataset in which the number of instances has increased. The dataset consists of patient age, number of pregnancies, contraceptives usage, smoking patterns and chronological records of sexually transmitted diseases (STDs). Microsoft azure machine learning tool was used for simulation of results. This paper mainly focuses on cervical cancer prediction through different screening methods using data mining techniques like Boosted decision tree, decision forest and decision jungle algorithms as well performance evaluation has done on the basis of AUROC (Area under Receiver operating characteristic) curve, accuracy, specificity and sensitivity. 10-fold cross-validation method was utilized to authenticate the results and Boosted decision tree has given the best results. Boosted decision tree provided very high prediction with 0.978 on AUROC curve while Hinslemann screening method has used. The results obtained by other classifiers were significantly worse than boosted decision tree.

[1]  Mandy Eberhart,et al.  Decision Forests For Computer Vision And Medical Image Analysis , 2016 .

[2]  M. Gómez-Roig,et al.  Relationship between smoking, HPV infection, and risk of Cervical cancer. , 2015, European journal of gynaecological oncology.

[3]  Wahidah Husain,et al.  Data Mining in Healthcare – A Review , 2015 .

[4]  Zhuoyuan Zheng,et al.  Oversampling Method for Imbalanced Classification , 2015, Comput. Informatics.

[5]  H. Hakonarson,et al.  Large sample size, wide variant spectrum, and advanced machine-learning technique boost risk prediction for inflammatory bowel disease. , 2013, American journal of human genetics.

[6]  M. Hejmadi Introduction to Cancer Biology , 2010 .

[7]  Jaime S. Cardoso,et al.  Supervised deep learning embeddings for the prediction of cervical cancer diagnosis , 2018, PeerJ Comput. Sci..

[8]  Babita Pandey,et al.  A New Intelligent Medical Decision Support System Based on Enhanced Hierarchical Clustering and Random Decision Forest for the Classification of Alcoholic Liver Damage, Primary Hepatoma, Liver Cirrhosis, and Cholelithiasis , 2018, Journal of healthcare engineering.

[9]  I. Löwy Cancer, women, and public health: the history of screening for cervical cancer. , 2010 .

[10]  Manolis Maragoudakis,et al.  A Novel, Gradient Boosting Framework for Sentiment Analysis in Languages where NLP Resources Are Not Plentiful: A Case Study for Modern Greek , 2017, Algorithms.

[11]  Talha Mahboob Alam,et al.  Domain Analysis of Information Extraction Techniques , 2018 .

[12]  M. Vihinen How to evaluate performance of prediction methods? Measures and their interpretation in variation effect analysis , 2012, BMC Genomics.

[13]  K. K. Sahu,et al.  Normalization: A Preprocessing Stage , 2015, ArXiv.

[14]  Nurcan Ocal,et al.  Predicting Financial Failure Using Decision Tree Algorithms: An Empirical Test on the Manufacturing Industry at Borsa Istanbul , 2015 .

[15]  K. Chitra,et al.  Data Mining Techniques and its Applications in Banking Sector , 2013 .

[16]  Anilu Franco-Arcega,et al.  Application of Decision Trees for Classifying Astronomical Objects , 2013, 2013 12th Mexican International Conference on Artificial Intelligence.

[17]  C J L M Meijer,et al.  The causal relation between human papillomavirus and cervical cancer. , 2002, Journal of clinical pathology.

[18]  Jaime S. Cardoso,et al.  Transfer Learning with Partial Observability Applied to Cervical Cancer Screening , 2017, IbPRIA.

[19]  David J. Hand,et al.  Measuring classifier performance: a coherent alternative to the area under the ROC curve , 2009, Machine Learning.

[20]  K. Petry HPV and cervical cancer , 2014, Scandinavian journal of clinical and laboratory investigation. Supplementum.

[21]  Ljubomir J. Buturovic,et al.  Cross-validation pitfalls when selecting and assessing regression and classification models , 2014, Journal of Cheminformatics.

[22]  N. Kamil,et al.  Global Cancer Incidences, Causes and Future Predictions for Subcontinent Region , 2015 .

[23]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[24]  Wouter Verbeke,et al.  A Robust profit measure for binary classification model evaluation , 2018, Expert Syst. Appl..

[25]  Majid Ezzati,et al.  Causes of cancer in the world: comparative risk assessment of nine behavioural and environmental risk factors , 2005, The Lancet.

[26]  Panos M. Pardalos,et al.  High-Dimensional Data Classification , 2014 .

[27]  M. Stoler,et al.  Using Biomarkers as Objective Standards in the Diagnosis of Cervical Biopsies , 2010, The American journal of surgical pathology.

[28]  G. M. Nasira,et al.  Prediction of Cervical Cancer using Hybrid Induction Technique: A Solution for Human Hereditary Disease Patterns , 2016 .

[29]  Y. Eraso,et al.  Migrating techniques, multiplying diagnoses: the contribution of Argentina and Brazil to early 'detection policy' in cervical cancer , 2010 .

[30]  Maciej Kusy,et al.  Application of gene expression programming and neural networks to predict adverse events of radical hysterectomy in cervical cancer patients , 2013, Medical & Biological Engineering & Computing.

[31]  J. Kai,et al.  Can machine-learning improve cardiovascular risk prediction using routine clinical data? , 2017, PloS one.

[32]  H. Kim,et al.  Development of a cervical cancer progress prediction tool for human papillomavirus-positive Koreans: A support vector machine-based approach , 2015, The Journal of international medical research.

[33]  S. Subramanian,et al.  Clinical trial to implementation: Cost and effectiveness considerations for scaling up cervical cancer screening in low- and middle-income countries , 2016 .

[34]  S. de Sanjosé,et al.  The natural history of human papillomavirus infection. , 2017, Best practice & research. Clinical obstetrics & gynaecology.

[35]  A. Jemal,et al.  Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries , 2018, CA: a cancer journal for clinicians.

[36]  Adhistya Erna Permanasari,et al.  Comparative study on data mining classification methods for cervical cancer prediction using pap smear results , 2016, 2016 1st International Conference on Biomedical Engineering (IBIOMED).

[37]  Clare Gilham,et al.  Efficacy of HPV-based screening for prevention of invasive cervical cancer: follow-up of four European randomised controlled trials , 2014, The Lancet.

[38]  Mei Liu,et al.  Prediction of protein-protein interactions using random decision forest framework , 2005, Bioinform..

[39]  Lior Rokach,et al.  Ensemble-based classifiers , 2010, Artificial Intelligence Review.

[40]  Yong Qi,et al.  A Classification Diagnosis of Cervical Cancer Medical Data Based on Various Artificial Neural Networks , 2018 .

[41]  Tae-Kyun Kim,et al.  Deep Convolutional Decision Jungle for Image Classification , 2017, ArXiv.

[42]  K. A. D. C. P. Kahandawaarachchi,et al.  Performance Evaluation on Machine Learning Classification Techniques for Disease Classification and Forecasting through Data Analytics for Chronic Kidney Disease (CKD) , 2017, 2017 IEEE 17th International Conference on Bioinformatics and Bioengineering (BIBE).

[43]  Simon Fong,et al.  Framework of Temporal Data Stream Mining by Using Incrementally Optimized Very Fast Decision Forest , 2018 .

[44]  V. Zutshi,et al.  Visual inspection of cervix with Lugol's iodine for early detection of premalignant & malignant lesions of cervix , 2012, The Indian journal of medical research.

[45]  Dimitris Kanellopoulos,et al.  Data Preprocessing for Supervised Leaning , 2007 .

[46]  Marijana Zekić-Sušac,et al.  A Comparison of Machine Learning Methods in a High-Dimensional Classification Problem , 2014 .

[47]  Daehan Won,et al.  Classification of Cervical Cancer Dataset , 2018, ArXiv.

[48]  Bogdan Obrzut,et al.  Prediction of 5–year overall survival in cervical cancer patients treated with radical hysterectomy using computational intelligence methods , 2017, BMC Cancer.

[49]  M. Aref-Adib,et al.  Cervical cancer prevention and screening: the role of human papillomavirus testing , 2016 .

[50]  D. Bodurka,et al.  Patient Preferences for Side Effects Associated With Cervical Cancer Treatment , 2014, International Journal of Gynecologic Cancer.

[51]  Parvez Ahmad,et al.  Techniques of Data Mining In Healthcare: A Review , 2015 .

[52]  Lior Rokach,et al.  Decision forest: Twenty years of research , 2016, Inf. Fusion.

[53]  Chi-Jie Lu,et al.  Prediction of Recurrence in Patients with Cervical Cancer Using MARS and Classification , 2022 .

[54]  A. Jemal,et al.  Cancer statistics, 2017 , 2017, CA: a cancer journal for clinicians.

[55]  Punch biopsies shorten time to clearance of high-risk human papillomavirus infections of the uterine cervix , 2018, BMC Cancer.

[56]  Antonio Corral,et al.  A Comparison of Feature Selection Methods to Optimize Predictive Models Based on Decision Forest Algorithms for Academic Data Analysis , 2018, WorldCIST.

[57]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[58]  K. Sales Human Papillomavirus and Cervical Cancer , 2014 .

[59]  K. Hajian‐Tilaki,et al.  Receiver Operating Characteristic (ROC) Curve Analysis for Medical Diagnostic Test Evaluation. , 2013, Caspian journal of internal medicine.

[60]  Sebastian Nowozin,et al.  Decision Jungles: Compact and Rich Models for Classification , 2013, NIPS.

[61]  Michele Follen,et al.  Prediction using hierarchical data: Applications for automated detection of cervical cancer , 2015, Stat. Anal. Data Min..

[62]  C. Açikel,et al.  Health Belief Model Scale for Cervical Cancer and Pap Smear Test: psychometric testing. , 2011, Journal of advanced nursing.