A Survey and Compare the Performance of IBM SPSS Modeler and Rapid Miner Software for Predicting Liver Disease by Using Various Data Mining Algorithms

Abstract. Today, with the development of industry and mechanized life style, prevalence of diseases is rising steadily, as well. In the meantime, the number of patients with liver diseases (such as fatty liver, cirrhosis and liver cancer, etc.) is rising. Since prevention is better than treatment, early diagnosis can be helpful for the treatment process so it is essential to develop some methods for detecting high-risk individuals who have the chance of getting liver diseases and also to adopt appropriate solutions for early diagnosis and initiation of treatment in early stages of the disease. In this study, we tried to use common data mining techniques that are used nowadays for diagnosis and treatment of different diseases, for the diagnosis and treatment of liver disease. For this purpose, we used Rapid Miner and IBM SPSS Modeler data mining tools together. Accuracy of different data mining algorithms such as C5.0 and C4.5, Decision tree and Neural Network were examined by the two above tools for predicting the prevalence of these diseases or early diagnosis of them using these algorithms. According to the results, the C4.5  and C5.0  algorithms by using IBM SPSS Modeler and Rapid Miner tools had 72.37% and 87.91% of accuracy respectively. Further, Neural Network algorithm by using Rapid Miner had the ability of showing more details.

[1]  Rong-Ho Lin,et al.  An intelligent model for liver disease diagnosis , 2009, Artif. Intell. Medicine.

[2]  G. V. Kass An Exploratory Technique for Investigating Large Quantities of Categorical Data , 1980 .

[3]  Renzo Sprugnoli,et al.  Data mining models for student careers , 2015, Expert Syst. Appl..

[4]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[5]  C. Jothi Venkateswaran,et al.  An Approach of Data Mining for Predicting the Chances of Liver Disease in Ectopic Pregnant Groups , 2013 .

[6]  Rajan Vohra,et al.  Liver Patient Classification Using Intelligent Techniques , 2014 .

[7]  Young Sun Kim,et al.  Screening test data analysis for liver disease prediction model using growth curve. , 2003, Biomedicine & pharmacotherapy = Biomedecine & pharmacotherapie.

[8]  G. Pillai,et al.  SVM Based Decision Support System for Heart Disease Classification with Integer-Coded Genetic Algorithm to Select Critical Features , 2009 .

[9]  Aida Mustapha,et al.  Classification of liver disease diagnosis: A comparative study , 2013, 2013 Second International Conference on Informatics & Applications (ICIA).

[10]  Lokesh Kumar Sharma,et al.  Comparative Study of Artificial Neural Network based Classification for Liver Patient , 2013 .

[11]  Touradj Ebrahimi,et al.  Classification of EEG signals using Dempster Shafer theory and a k-nearest neighbor classifier , 2009, 2009 4th International IEEE/EMBS Conference on Neural Engineering.

[12]  Ingrid Daubechies,et al.  The wavelet transform, time-frequency localization and signal analysis , 1990, IEEE Trans. Inf. Theory.

[13]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[14]  Franco Turini,et al.  Mining Clinical Data with a Temporal Dimension: A Case Study , 2007, 2007 IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2007).

[15]  J. Ross Quinlan,et al.  Bagging, Boosting, and C4.5 , 1996, AAAI/IAAI, Vol. 1.

[16]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[17]  T. Martin McGinnity,et al.  Design for Self-Organizing Fuzzy Neural Networks Based on Genetic Algorithms , 2006, IEEE Transactions on Fuzzy Systems.

[18]  Philip S. Yu,et al.  Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.

[19]  S. Sookoian,et al.  The genetic epidemiology of nonalcoholic fatty liver disease: toward a personalized medicine. , 2012, Clinics in liver disease.

[20]  David M. W. Powers,et al.  Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation , 2011, ArXiv.

[21]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[22]  César Hervás-Martínez,et al.  Predicting patient survival after liver transplantation using evolutionary multi-objective artificial neural networks , 2013, Artif. Intell. Medicine.

[23]  Ayako Suzuki,et al.  Age-related differences in reporting of drug-associated liver injury: data-mining of WHO Safety Report Database. , 2014, Regulatory toxicology and pharmacology : RTP.

[24]  Hak-Keung Lam,et al.  Tuning of the structure and parameters of a neural network using an improved genetic algorithm , 2003, IEEE Trans. Neural Networks.

[25]  D. Powers Evaluation: From Precision, Recall and F-Factor to ROC, Informedness, Markedness & Correlation , 2008 .

[26]  Hoon Jin,et al.  Decision Factors on Effective Liver Patient Data Prediction , 2014, BSBT 2014.

[27]  Mark Beale,et al.  Neural Network Toolbox™ User's Guide , 2015 .