Computer-aided decision-making for predicting liver disease using PSO-based optimized SVM with feature selection

Abstract Using medical data mining models has been considered as a significant way to predict diseases in recent years. In the field of healthcare, we face a large amount of data, and this is one of the challenges in predicting and analyzing the target disease. With the help of data mining models, one can convert this data into valuable information, and through analyzing them logically and scientifically, one can reach accurate decision-making and actual prediction. Another challenge in the field of disease prediction is selecting features that are more significant than other features. Feature subset selection is performed to improve the performance of models with the highest accuracy. The purpose of this study is to select significant features by comparing data mining models to predict liver disease based on an extraction, loading, transformation, analysis (ELTA) approach for correct diagnosis. Hence, the data mining models are compared based on the ELTA approach, such as random forest, Multi-Layer Perceptron (MLP) neural network, Bayesian networks, Support Vector Machine (SVM), and Particle Swarm Optimization (PSO)-SVM. Among these models, the PSO-SVM model has the best performance regarding the criteria of specificity, sensitivity, accuracy, Area under the Curve (AUC), F-measure, precision, and False Positive Rate (FPR). Furthermore, a 10-fold cross-validation method for evaluation of models is used so that the models were evaluated on a liver disease dataset. The average of estimated accuracy was calculated as 87.35%, 78.91%, 66.78%, 76.51% and 95.17% for Random forest, MLP Neural network, Bayesian network, SVM and PSO-SVM models, respectively. Regarding the mentioned evaluation criteria, we obtained the highest performance of accuracy with the least number of features through the hybrid PSO-SVM-based optimized model.

[1]  Liane Colonna A Taxonomy and Classification of Data Mining , 2013 .

[2]  Musa Peker,et al.  Novel approaches for automated epileptic diagnosis using FCBF selection and classification algorithms , 2013 .

[3]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machines , 2002 .

[4]  N. B. Venkateswarlu,et al.  A Critical Study of Selected Classification Algorithms for Liver Disease Diagnosis , 2011 .

[5]  Moloud Abdar,et al.  Performance analysis of classification algorithms on early detection of liver disease , 2017, Expert Syst. Appl..

[6]  Ruxandra Stoean,et al.  Feature selection for a cooperative coevolutionary classifier in liver fibrosis diagnosis , 2011, Comput. Biol. Medicine.

[7]  Jafar Habibi,et al.  Coronary artery disease detection using computational intelligence methods , 2016, Knowl. Based Syst..

[8]  S. R. Ghosh,et al.  Analysis of classification algorithms for liver disease diagnosis , 2017 .

[9]  Shulin Wang,et al.  Feature selection in machine learning: A new perspective , 2018, Neurocomputing.

[10]  Claude E. Shannon,et al.  Prediction and Entropy of Printed English , 1951 .

[11]  Russell Greiner,et al.  Learning Bayesian Belief Network Classifiers: Algorithms and System , 2001, Canadian Conference on AI.

[12]  Mario Marchand,et al.  Feature Selection with Conjunctions of Decision Stumps and Learning from Microarray Data , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  J. Anuradha,et al.  Classification and Rule Extraction using Rough Set for Diagnosis of Liver Disease and its Types , 2011 .

[14]  Huan Liu,et al.  Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution , 2003, ICML.

[15]  Jerzy W. Grzymala-Busse,et al.  A Comparison of Several Approaches to Missing Attribute Values in Data Mining , 2000, Rough Sets and Current Trends in Computing.

[16]  Ruo-Ping Han,et al.  Disease prediction with different types of neural network classifiers , 2016, Telematics Informatics.

[17]  Chih-Jen Lin,et al.  Combining SVMs with Various Feature Selection Strategies , 2006, Feature Extraction.

[18]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[19]  S. Dhamodharan Liver Disease Prediction Using Bayesian Classification , 2014 .

[20]  Usama M. Fayyad,et al.  Data mining and knowledge discovery in databases: implications for scientific databases , 1997, Proceedings. Ninth International Conference on Scientific and Statistical Database Management (Cat. No.97TB100150).

[21]  E. Tapper,et al.  Mortality due to cirrhosis and liver cancer in the United States, 1999-2016: observational study , 2018, British Medical Journal.

[22]  Bing Han,et al.  A hybrid PSO-SVM-based model for determination of oil recovery factor in the low-permeability reservoir , 2017 .

[23]  Zidong Wang,et al.  A new switching-delayed-PSO-based optimized SVM algorithm for diagnosis of Alzheimer's disease , 2018, Neurocomputing.

[24]  Y. B. Mahdy,et al.  SS-SVM (3SVM): A New Classification Method for Hepatitis Disease Diagnosis , 2013 .

[25]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[26]  N. B. Venkateswarlu,et al.  A Critical Comparative Study of Liver Patients from USA and INDIA: An Exploratory Analysis , 2012 .

[27]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[28]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[29]  Harleen Kaur,et al.  The impact of data mining techniques on medical diagnostics , 2006, Data Sci. J..

[30]  M. Baszun,et al.  The Learning System by the Least Squares Support Vector Machine Method and its Application in Medicine , 2011 .

[31]  Hoon Jin,et al.  Decision Factors on Effective Liver Patient Data Prediction , 2014, BSBT 2014.

[32]  Filipe Portela,et al.  A Clustering Approach for Predicting Readmissions in Intensive Medicine , 2014 .

[33]  Huan Liu,et al.  Toward integrating feature selection algorithms for classification and clustering , 2005, IEEE Transactions on Knowledge and Data Engineering.

[34]  Rakesh Agrawal,et al.  Parallel Mining of Association Rules , 1996, IEEE Trans. Knowl. Data Eng..

[35]  Xizhao Wang,et al.  A review on neural networks with random weights , 2018, Neurocomputing.

[36]  De-Shuang Huang,et al.  Using FCMC, FVS, and PCA techniques for feature extraction of multispectral images , 2005, IEEE Geosci. Remote. Sens. Lett..

[37]  Vipin Kumar,et al.  Introduction to Data Mining , 2022, Data Mining and Machine Learning Applications.

[38]  Mehmed Kantardzic,et al.  Data Mining: Concepts, Models, Methods, and Algorithms , 2002 .

[39]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[40]  Moloud Abdar,et al.  Improving the Diagnosis of Liver Disease Using Multilayer Perceptron Neural Network and Boosted Decision Trees , 2018 .

[41]  Daniel J. Licht,et al.  Prediction of periventricular leukomalacia. Part I: Selection of hemodynamic features using logistic regression and decision tree algorithms , 2009, Artif. Intell. Medicine.

[42]  P. Yogesh,et al.  Evolutionary Approach for Network Anomaly Detection Using Effective Classification , 2009 .

[43]  Reza Toushmalani,et al.  Gravity inversion of a fault by Particle swarm optimization (PSO) , 2013, SpringerPlus.

[44]  Sa'diyah Noor Novita Alfisahrin,et al.  Data Mining Techniques for Optimization of Liver Disease Classification , 2013, 2013 International Conference on Advanced Computer Science Applications and Technologies.

[45]  Bendi Venkata Ramana,et al.  Liver Classification Using Modified Rotation Forest , 2012 .

[46]  Kristin P. Bennett,et al.  Support vector machines: hype or hallelujah? , 2000, SKDD.