Analysis of Risk Factors for Cervical Cancer Based on Machine Learning Methods

Cervical cancer is one of the illness which is threatening women’s health all over the world and it is hard to observe any sign in the early stage. Three methods have been introduced in this paper to analyze the dataset of cervical cancer, including SVM (Support Vector Machine), XGBoost (eXtreme Gradient Boosting) and Random Forest. The dataset contains 32 risk factors and four target variables: Hinselmann, Schiller, Cytology, and Biopsy. And the diagnostic results of these four target variables were classified by the three methods that mentioned above. Finally, the top five risk factors which affect the diagnosis most were found, and the classification results showed that XGBoost and Random Forest perform better than SVM.

[1]  Jaime S. Cardoso,et al.  Transfer Learning with Partial Observability Applied to Cervical Cancer Screening , 2017, IbPRIA.

[2]  Jacques Ferlay,et al.  Estimating the world cancer burden: Globocan 2000 , 2001, International journal of cancer.

[3]  Thomas Kahn,et al.  Value of diffusion-weighted MRI in diagnosis of uterine cervical cancer: a prospective study evaluating the benefits of DWI compared to conventional MR sequences in a 3T environment , 2016, Acta radiologica.

[4]  Kagan Tumer,et al.  Ensembles of radial basis function networks for spectroscopic detection of cervical precancer , 1998, IEEE Transactions on Biomedical Engineering.

[5]  A. Gadducci,et al.  Smoking habit, immune suppression, oral contraceptive use, and hormone replacement therapy use and cervical carcinogenesis: a review of the literature , 2011, Gynecological endocrinology : the official journal of the International Society of Gynecological Endocrinology.

[6]  C Sommer Quantitative characterization, classification and reconstruction of oocyst shapes of Eimeria species from cattle. , 1998, Parasitology.

[7]  Rebecca R. Richards-Kortum,et al.  An image model and segmentation algorithm for reflectance confocal images of in vivo cervical tissue , 2005, IEEE Transactions on Image Processing.

[8]  Masoom A. Haider,et al.  Diffusion-weighted MRI in cervical cancer , 2008, European Radiology.

[9]  Roy Zhang,et al.  The role of co-factors in the progression from human papillomavirus infection to cervical cancer. , 2013, Gynecologic oncology.

[10]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[11]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.