Different medical data mining approaches based prediction of ischemic stroke

AIM Medical data mining (also called knowledge discovery process in medicine) processes for extracting patterns from large datasets. In the current study, we intend to assess different medical data mining approaches to predict ischemic stroke. MATERIALS AND METHODS The collected dataset from Turgut Ozal Medical Centre, Inonu University, Malatya, Turkey, comprised the medical records of 80 patients and 112 healthy individuals with 17 predictors and a target variable. As data mining approaches, support vector machine (SVM), stochastic gradient boosting (SGB) and penalized logistic regression (PLR) were employed. 10-fold cross validation resampling method was utilized, and model performance evaluation metrics were accuracy, area under ROC curve (AUC), sensitivity, specificity, positive predictive value and negative predictive value. The grid search method was used for optimizing tuning parameters of the models. RESULTS The accuracy values with 95% CI were 0.9789 (0.9470-0.9942) for SVM, 0.9737 (0.9397-0.9914) for SGB and 0.8947 (0.8421-0.9345) for PLR. The AUC values with 95% CI were 0.9783 (0.9569-0.9997) for SVM, 0.9757 (0.9543-0.9970) for SGB and 0.8953 (0.8510-0.9396) for PLR. CONCLUSIONS The results of the current study demonstrated that the SVM produced the best predictive performance compared to the other models according to the majority of evaluation metrics. SVM and SGB models explained in the current study could yield remarkable predictive performance in the classification of ischemic stroke.

[1]  Robert E. Schapire,et al.  The Boosting Approach to Machine Learning An Overview , 2003 .

[2]  Max Kuhn,et al.  Building Predictive Models in R Using the caret Package , 2008 .

[3]  Adnan I. Qureshi,et al.  Guidelines for the Early Management of Adults With Ischemic Stroke , 2007 .

[4]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[5]  Yu Cao,et al.  An integrated machine learning approach to stroke prediction , 2010, KDD.

[6]  Cemil Colak,et al.  Application of knowledge discovery process on the prediction of stroke , 2015, Comput. Methods Programs Biomed..

[7]  S Sridhar,et al.  Improving diagnostic accuracy using agent-based distributed data mining system , 2013, Informatics for health & social care.

[8]  Nils Daniel Forkert,et al.  Classifiers for Ischemic Stroke Lesion Segmentation: A Comparison Study , 2015, PloS one.

[9]  K. Vemmos,et al.  DIAGNOSIS OF STROKE USING INDUCTIVE MACHINE LEARNING , 1999 .

[10]  Markus Hofmann,et al.  RapidMiner: Data Mining Use Cases and Business Analytics Applications , 2013 .

[11]  Huiling Chen,et al.  Using Blood Indexes to Predict Overweight Statuses: An Extreme Learning Machine-Based Approach , 2015, PloS one.

[12]  Kurt Hornik,et al.  kernlab - An S4 Package for Kernel Methods in R , 2004 .

[13]  J. Friedman Stochastic gradient boosting , 2002 .

[14]  A. Gastón,et al.  Modelling species distributions with penalised logistic regressions: A comparison with maximum entropy models , 2011 .

[15]  Mee Young Park,et al.  Penalized logistic regression for detecting gene interactions. , 2008, Biostatistics.

[16]  L. Wilhelmsen,et al.  Risk factors for stroke in middle-aged men in Göteborg, Sweden. , 1990, Stroke.

[17]  Ruxandra Stoean,et al.  Evolutionary-driven support vector machines for determining the degree of liver fibrosis in chronic hepatitis C , 2011, Artif. Intell. Medicine.

[18]  T. Santhanam,et al.  Application of K-Means and Genetic Algorithms for Dimension Reduction by Integrating SVM for Diabetes Diagnosis , 2015 .

[19]  M. Amer,et al.  Nearest-Neighbor and Clustering based Anomaly Detection Algorithms for RapidMiner , 2012 .

[20]  Heinz Handels,et al.  Random forests with selected features for stroke lesion segmentation , 2015 .

[21]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[22]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[23]  P. Wolf,et al.  Atrial fibrillation as an independent risk factor for stroke: the Framingham Study. , 1991, Stroke.

[24]  A Ziegler,et al.  Two Models for Outcome Prediction , 2006, Methods of Information in Medicine.

[25]  David Lee Gordon,et al.  Classification of Subtype of Acute Ischemic Stroke: Definitions for Use in a Multicenter Clinical Trial , 1993, Stroke.

[26]  Joseph C. Griffis,et al.  Voxel-based Gaussian naïve Bayes classification of ischemic stroke lesions in individual T1-weighted MRI scans , 2016, Journal of Neuroscience Methods.

[27]  Siegfried J. Pöppl,et al.  Two Models for Outcome Prediction , 2006 .