Auto-MeDiSine: an auto-tunable medical decision support engine using an automated class outlier detection method and AutoMLP

With advanced data analysis techniques, efforts for more accurate decision support systems for disease prediction are on the rise. According to the World Health Organization, diabetes-related illnesses and mortalities are on the rise. Hence, early diagnosis is particularly important. In this paper, we present a framework, Auto-MeDiSine, that comprises an automated version of enhanced class outlier detection using a distance-based algorithm (AutoECODB), combined with an ensemble of automatic multilayer perceptron (AutoMLP). AutoECODB is built upon ECODB by automating the tuning of parameters to optimize outlier detection process. AutoECODB cleanses the dataset by removing outliers. Preprocessed dataset is then used to train a prediction model using an ensemble of AutoMLPs. A set of experiments is performed on publicly available Pima Indian Diabetes Dataset as follows: (1) Auto-MeDiSine is compared with other state-of-the-art methods reported in the literature where Auto-MeDiSine realized an accuracy of 88.7%; (2) AutoMLP is compared with other learners including individual (focusing on neural network-based learners) and ensemble learners; and (3) AutoECODB is compared with other preprocessing methods. Furthermore, in order to validate the generality of the framework, Auto-MeDiSine is tested on another publicly available BioStat Diabetes Dataset where it outperforms the existing reported results, reaching an accuracy of 97.1%.

[1]  T. Yıldırım,et al.  MEDICAL DIAGNOSIS ON PIMA INDIAN DIABETES USING GENERAL REGRESSION NEURAL NETWORKS , 2003 .

[2]  M. T. Mira Kania Sabariah,et al.  Early detection of type II Diabetes Mellitus with random forest and classification and regression tree (CART) , 2014, 2014 International Conference of Advanced Informatics: Concept, Theory and Application (ICAICTA).

[3]  Lin Li Diagnosis of Diabetes Using a Weight-Adjusted Voting Approach , 2014, 2014 IEEE International Conference on Bioinformatics and Bioengineering.

[4]  S. Jaafar,et al.  Diabetes mellitus forecast using artificial neural network (ANN) , 2005, 2005 Asian Conference on Sensors and the International Conference on New Techniques in Pharmaceutical and Biomedical Research.

[5]  Philippe Renevey,et al.  SVM-based recursive feature elimination to compare phase synchronization computed from broadband and narrowband EEG signals in Brain-Computer Interfaces , 2005, Signal Process..

[6]  Lawrence O. Hall,et al.  Predicting Juvenile Diabetes from Clinical Test Results , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[7]  N. A. Nnamoko,et al.  Meta-classification model for diabetes onset forecast: A proof of concept , 2014, 2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[8]  Asma A. Al Jarullah Decision tree discovery for the diagnosis of type II diabetes , 2011, 2011 International Conference on Innovations in Information Technology.

[9]  Huan Liu,et al.  Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution , 2003, ICML.

[10]  G. M. Nasira,et al.  A New Approach for Diagnosis of Diabetes and Prediction of Cancer Using ANFIS , 2014, 2014 World Congress on Computing and Communication Technologies.

[11]  Nabil M. Hewahi,et al.  A comparative Study of Outlier Mining and Class Outlier Mining , 2009 .

[12]  Imran Siddiqi,et al.  Improving handwriting based gender classification using ensemble classifiers , 2017, Expert Syst. Appl..

[13]  Shweta Kharya,et al.  Using data mining techniques for diagnosis and prognosis of cancer disease , 2012, ArXiv.

[14]  M. Johns Importance Sampling for Bootstrap Confidence Intervals , 1988 .

[15]  Thomas M. Breuel,et al.  Scanning Neural Network for Text Line Recognition , 2012, 2012 10th IAPR International Workshop on Document Analysis Systems.

[16]  Kemal Polat,et al.  A cascade learning system for classification of diabetes disease: Generalized Discriminant Analysis and Least Square Support Vector Machine , 2008, Expert Syst. Appl..

[17]  Murali S. Shanker,et al.  Using Neural Networks To Predict the Onset of Diabetes Mellitus , 1996, J. Chem. Inf. Comput. Sci..

[18]  Richa Sharma,et al.  Diabetes mellitus prediction system evaluation using C4.5 rules and partial tree , 2015, 2015 4th International Conference on Reliability, Infocom Technologies and Optimization (ICRITO) (Trends and Future Directions).

[19]  Dursun Delen,et al.  Predicting breast cancer survivability: a comparison of three data mining methods , 2005, Artif. Intell. Medicine.

[20]  A. A. Shafie,et al.  Application of modeling techniques to diabetes diagnosis , 2010, 2010 IEEE EMBS Conference on Biomedical Engineering and Sciences (IECBES).

[21]  Jin Park,et al.  A sequential neural network model for diabetes prediction , 2001, Artif. Intell. Medicine.

[22]  Illhoi Yoo,et al.  Data Mining in Healthcare and Biomedicine: A Survey of the Literature , 2012, Journal of Medical Systems.

[23]  G. Imbens,et al.  Efficient estimation and stratified sampling , 1996 .

[24]  S. Anitha,et al.  Application of a radial basis function neural network for diagnosis of diabetes mellitus , 2006 .

[25]  J. Hertzberg,et al.  Artificial Neural Network-Based Method of Screening Heart Murmurs in Children , 2001, Circulation.

[26]  Zhilbert Tafa,et al.  An intelligent system for diabetes prediction , 2015, 2015 4th Mediterranean Conference on Embedded Computing (MECO).

[27]  G. Sathyadevi,et al.  Application of CART algorithm in hepatitis disease diagnosis , 2011, 2011 International Conference on Recent Trends in Information Technology (ICRTIT).

[28]  Mehreen Ahmed,et al.  MCS: Multiple classifier system to predict the churners in the telecom industry , 2017, 2017 Intelligent Systems Conference (IntelliSys).

[29]  Mark A. Hall,et al.  Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning , 1999, ICML.

[30]  Nabil M. Hewahi,et al.  Class Outliers Mining: Distance-Based Approach , 2007 .

[31]  Stephen D. Bay,et al.  Mining distance-based outliers in near linear time with randomization and a simple pruning rule , 2003, KDD '03.

[32]  Sudipto Saha,et al.  Prediction of continuous B‐cell epitopes in an antigen using recurrent neural network , 2006, Proteins.

[33]  Chang-Shing Lee,et al.  Ontology-based Fuzzy Inference Agent for Diabetes Classification , 2007, NAFIPS 2007 - 2007 Annual Meeting of the North American Fuzzy Information Processing Society.

[34]  K Kasikumar,et al.  Applications of Data Mining Techniques in Healthcare and Prediction of Heart Attacks , 2018, International Journal of Data Mining Techniques and Applications.

[35]  Praveen Kumar,et al.  Diagnosis of Diabetes Mellitus based on Risk Factors , 2010 .

[36]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[37]  C. Floyd,et al.  Prediction of breast cancer malignancy using an artificial neural network , 1994, Cancer.

[38]  M A Musen,et al.  Sequential versus standard neural networks for pattern recognition: An example using the domain of coronary heart disease , 1997, Comput. Biol. Medicine.

[39]  Donald F. Specht,et al.  A general regression neural network , 1991, IEEE Trans. Neural Networks.

[40]  Fevzullah Temurtas,et al.  A comparative study on diabetes disease diagnosis using neural networks , 2009, Expert Syst. Appl..

[41]  Zyad Shaaban,et al.  Normalization as a Preprocessing Engine for Data Mining and the Approach of Preference Matrix , 2006, 2006 International Conference on Dependability of Computer Systems.

[42]  D. Reich,et al.  Principal components analysis corrects for stratification in genome-wide association studies , 2006, Nature Genetics.

[43]  Somula Ramasubbareddy,et al.  Classification of Heart Disease Using Support Vector Machine , 2019, Journal of Computational and Theoretical Nanoscience.

[44]  R. Polikar,et al.  Ensemble based systems in decision making , 2006, IEEE Circuits and Systems Magazine.

[45]  D. G. Bounds,et al.  A multilayer perceptron network for the diagnosis of low back pain , 1988, IEEE 1988 International Conference on Neural Networks.

[46]  Amir Akramin Shafie,et al.  A novel signal diagnosis technique using pseudo complex-valued autoregressive technique , 2011, Expert Syst. Appl..

[47]  W. Wettayaprasit,et al.  Linguistic Knowledge Extraction from Neural Networks Using Maximum Weight and Frequency Data Representation , 2006, 2006 IEEE Conference on Cybernetics and Intelligent Systems.

[48]  Sungyoung Lee,et al.  Prediction of Diabetes Mellitus Based on Boosting Ensemble Modeling , 2014, UCAmI.

[49]  Lucila Ohno-Machado,et al.  Logistic regression and artificial neural network classification models: a methodology review , 2002, J. Biomed. Informatics.

[50]  Senlin Luo,et al.  Rule Extraction From Support Vector Machines Using Ensemble Learning Approach: An Application for Diagnosis of Diabetes , 2015, IEEE Journal of Biomedical and Health Informatics.

[51]  M Anbarasi,et al.  ENHANCED PREDICTION OF HEART DISEASE WITH FEATURE SUBSET SELECTION USING GENETIC ALGORITHM , 2010 .

[52]  Nesma Settouti,et al.  Recognition of diabetes disease using a new hybrid learning algorithm for NEFCLASS , 2013, 2013 8th International Workshop on Systems, Signal Processing and their Applications (WoSSPA).

[53]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[54]  C. Lursinsap,et al.  Critical support vector machine without kernel function , 2002, Proceedings of the 9th International Conference on Neural Information Processing, 2002. ICONIP '02..

[55]  Chris H. Q. Ding,et al.  Minimum redundancy feature selection from microarray gene expression data , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[56]  Khawar Khurshid,et al.  An expert system for diabetes prediction using auto tuned multi-layer perceptron , 2017, 2017 Intelligent Systems Conference (IntelliSys).

[57]  Mohsen Beheshti,et al.  Diabetes Data Analysis and Prediction Model Discovery Using RapidMiner , 2008, 2008 Second International Conference on Future Generation Communication and Networking.

[58]  Yang Guo,et al.  Using Bayes Network for Prediction of Type-2 diabetes , 2012, 2012 International Conference for Internet Technology and Secured Transactions.

[59]  Hermann Ney,et al.  Quantile based histogram equalization for online applications , 2002, INTERSPEECH.

[60]  G. Baudat,et al.  Generalized Discriminant Analysis Using a Kernel Approach , 2000, Neural Computation.