HMV: A medical decision support framework using multi-layer classifiers for disease prediction

Abstract Decision support is a crucial function for decision makers in many industries. Typically, Decision Support Systems (DSS) help decision-makers to gather and interpret information and build a foundation for decision-making. Medical Decision Support Systems (MDSS) play an increasingly important role in medical practice. By assisting doctors with making clinical decisions, DSS are expected to improve the quality of medical care. Conventional clinical decision support systems are based on individual classifiers or a simple combination of these classifiers which tend to show moderate performance. In this research, a multi-layer classifier ensemble framework is proposed based on the optimal combination of heterogeneous classifiers. The proposed model named “HMV” overcomes the limitations of conventional performance bottlenecks by utilizing an ensemble of seven heterogeneous classifiers. The framework is evaluated on two different heart disease datasets, two breast cancer datasets, two diabetes datasets, two liver disease datasets, one Parkinson's disease dataset and one hepatitis dataset obtained from public repositories. Effectiveness of the proposed ensemble is investigated by comparison of results with several well-known classifiers as well as ensemble techniques. The experimental evaluation shows that the proposed framework dealt with all types of attributes and achieved high diagnosis accuracy. A case study is also presented based on a real time medical dataset in order to show the high performance and effectiveness of the proposed model.

[1]  Mads Thomassen,et al.  Long non-coding RNA expression profiles predict metastasis in lymph node-negative breast cancer independently of traditional prognostic markers , 2015, Breast Cancer Research.

[2]  Hyontai Sug,et al.  Improving the prediction accuracy of liver disorder disease with oversampling , 2012 .

[3]  J. Perlmutter,et al.  Predictors of survival in patients with Parkinson disease. , 2012, Archives of neurology.

[4]  Tony R. Martinez,et al.  Improved Heterogeneous Distance Functions , 1996, J. Artif. Intell. Res..

[5]  Saurabh Pal,et al.  A Novel Approach for Breast Cancer Detection Using Data Mining Techniques , 2017 .

[6]  Gavin Brown,et al.  Random Ordinality Ensembles: Ensemble methods for multi-valued categorical data , 2015, Inf. Sci..

[7]  Enas M. F. El Houby A Framework for Prediction of Response to HCV Therapy Using Different Data Mining Techniques , 2014, Adv. Bioinformatics.

[8]  Fevzullah Temurtas,et al.  A comparative study on thyroid disease diagnosis using neural networks , 2009, Expert Syst. Appl..

[9]  Asma Parveen,et al.  PREDICTION SYSTEM FOR HEART DISEASE USING NAIVE BAYES , 2012 .

[10]  Adel Nadjaran Toosi,et al.  Hepatitis Disease Diagnosis Using Hybrid Case Based Reasoning and Particle Swarm Optimization , 2012 .

[11]  Michael A. King,et al.  Ensemble learning methods for pay-per-click campaign management , 2015, Expert Syst. Appl..

[12]  J. Anuradha,et al.  Classification and Rule Extraction using Rough Set for Diagnosis of Liver Disease and its Types , 2011 .

[13]  Hamid Parvin,et al.  Proposing a classifier ensemble framework based on classifier selection and decision tree , 2015, Eng. Appl. Artif. Intell..

[14]  Shih-Wei Lin,et al.  Particle swarm optimization for parameter determination and feature selection of support vector machines , 2008, Expert Syst. Appl..

[15]  Juan Miguel García-Gómez,et al.  Data Mining in Clinical Medicine , 2015, Methods in Molecular Biology.

[16]  Dae-Ki Kang,et al.  Geometric mean based boosting algorithm with over-sampling to resolve data imbalance problem for bankruptcy prediction , 2015, Expert Syst. Appl..

[17]  Junggi Yang,et al.  Comparison of Prediction Models for Coronary Heart Diseases in Depression Patients , 2015, MUE 2015.

[18]  Jian Zhang,et al.  Double-bootstrapping source data selection for instance-based transfer learning , 2013, Pattern Recognit. Lett..

[19]  Juan José Rodríguez Diez,et al.  Random Balance: Ensembles of variable priors classifiers for imbalanced data , 2015, Knowl. Based Syst..

[20]  Der-Chiang Li,et al.  A fuzzy-based data transformation for feature extraction to increase classification performance with small medical data sets , 2011, Artif. Intell. Medicine.

[21]  Gaurav Pandey,et al.  A Comparative Analysis of Ensemble Classifiers: Case Studies in Genomics , 2013, 2013 IEEE 13th International Conference on Data Mining.

[22]  Lior Rokach,et al.  Ensemble-based classifiers , 2010, Artificial Intelligence Review.

[23]  Chetan Patil,et al.  Heart Disease Diagnosis using Support Vector Machine , 2011 .

[24]  V. Karthikeyani,et al.  Comparative of Data Mining Classification Algorithm (CDMCA) in Diabetes Disease Prediction , 2012 .

[25]  Elif Derya íbeyli Implementing automated diagnostic systems for breast cancer detection , 2007 .

[26]  N. B. Venkateswarlu,et al.  A Critical Study of Selected Classification Algorithms for Liver Disease Diagnosis , 2011 .

[27]  Amita Pal,et al.  Generalized quadratic discriminant analysis , 2015, Pattern Recognit..

[28]  Chih-Jen Lin,et al.  Combining SVMs with Various Feature Selection Strategies , 2006, Feature Extraction.

[29]  Nasir D. Memon,et al.  CoCoST: A Computational Cost Efficient Classifier , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[30]  Michel Verleysen,et al.  K nearest neighbours with mutual information for simultaneous classification and missing data imputation , 2009, Neurocomputing.

[31]  Alípio Mário Jorge,et al.  Improving the accuracy of long-term travel time prediction using heterogeneous ensembles , 2015, Neurocomputing.

[32]  S. Pal,et al.  Data Mining Techniques: To Predict and Resolve Breast Cancer Survivability , 2017 .

[33]  Kensuke Koshijima,et al.  Non-parametric entropy estimators based on simple linear regression , 2015, Comput. Stat. Data Anal..

[34]  Varun Kumar,et al.  Hepatitis Prediction Model based on Data Mining Algorithm and Optimal Feature Selection to Improve Predictive Accuracy , 2012 .

[35]  R. Prashanth,et al.  Automatic classification and prediction models for early Parkinson's disease diagnosis from SPECT imaging , 2014, Expert Syst. Appl..

[36]  M VarunKumar.,et al.  Hepatitis Prediction Model based on Data Mining Algorithm and Optimal Feature Selection to Improve Predictive Accuracy , 2012 .

[37]  Sapna,et al.  DATA MINING – FUZZY NEURAL GENETIC ALGORITHM IN PREDICTING DIABETES , 2008 .

[38]  S. Balamurali,et al.  Performance Analysis of Classifier Models to Predict Diabetes Mellitus , 2015 .

[39]  G. Sahoo,et al.  Predication of Parkinson's disease using data mining methods: A comparative analysis of tree, statistical and support vector machine classifiers , 2011, 2012 NATIONAL CONFERENCE ON COMPUTING AND COMMUNICATION SYSTEMS.

[40]  Sungzoon Cho,et al.  Multi-class classification via heterogeneous ensemble of one-class classifiers , 2015, Eng. Appl. Artif. Intell..

[41]  M. Tkacz,et al.  Comparison of outlier detection methods in biomedical data , 2010 .

[42]  Usman Qamar,et al.  BagMOOV: A novel ensemble for heart disease prediction bootstrap aggregation with multi-objective optimized voting , 2015, Australasian Physical & Engineering Sciences in Medicine.

[43]  Russ B. Altman,et al.  OrderRex: clinical order decision support and outcome predictions by data-mining electronic medical records , 2016, J. Am. Medical Informatics Assoc..

[44]  Nada Lavrac,et al.  Relating ensemble diversity and performance: A study in class noise detection , 2015, Neurocomputing.

[45]  Isabelle Guyon,et al.  An Introduction to Feature Extraction , 2006, Feature Extraction.

[46]  Stefano Panzieri,et al.  Urban traffic flow forecasting through statistical and neural network bagging ensemble hybrid modeling , 2015, Neurocomputing.

[47]  Johannes Fürnkranz,et al.  Efficient implementation of class-based decomposition schemes for Naïve Bayes , 2013, Machine Learning.

[48]  T. Karthikeyan,et al.  Analysis of Classification Algorithms Applied to Hepatitis Patients , 2013 .

[49]  Hoon Jin,et al.  Decision Factors on Effective Liver Patient Data Prediction , 2014, BSBT 2014.

[50]  Swagatam Das,et al.  Near-Bayesian Support Vector Machines for imbalanced data classification with equal or unequal misclassification costs , 2015, Neural Networks.

[51]  Sultan Aljahdali,et al.  Comparative Prediction Performance with Support Vector Machine and Random Forest Classification Techniques , 2013 .

[52]  H. Mahjub,et al.  Real-Data Comparison of Data Mining Methods in Prediction of Diabetes in Iran , 2013, Healthcare informatics research.

[53]  S.Nirmala Sugirtha Rajini,et al.  An Ill-identified Classification to Predict Cardiac Disease Using Data Clustering , 2014 .