Effective framework for prediction of disease outcome using medical datasets: clustering and classification

The method of processing two algorithms within a single workflow, and hence the combined method, is called as hybrid computing. We propose a data mining framework comprising of two stages, namely clustering and classification. The first stage employs k-means algorithm on data and generates two clusters, namely cluster-0 and cluster-1. Instances in cluster-0 do not have disease symptoms and cluster-1 consists of instances with disease symptoms. The verification of valid grouping is then carried out by referring to the association of class labels in original datasets. Incorrectly classified instances are removed and remaining instances are used to build the classifier using C4.5 decision-tree algorithm with k-fold cross validation method. The framework was tested using eight datasets from the machine learning repository of the UCI. The proposed framework was evaluated for accuracy, sensitivity and specificity measures. Our framework obtained promising classification accuracy as compared to other methods found in the literature.

[1]  M. Serdar Bascil,et al.  A Study on Hepatitis Disease Diagnosis Using Multilayer Neural Network with Levenberg Marquardt Training Algorithm , 2011, Journal of Medical Systems.

[2]  T. Yildirim,et al.  Diagnosis of cardiac problems from SPECT images by feedforward networks , 2004, Proceedings of the IEEE 12th Signal Processing and Communications Applications Conference, 2004..

[3]  L Goodwin,et al.  Data mining issues for improved birth outcomes. , 1997, Biomedical sciences instrumentation.

[4]  Kemal Polat,et al.  Artificial Immune Recognition System Based Classifier Ensemble on the Different Feature Subsets for Detecting the Cardiac Disorders from SPECT Images , 2007, DEXA.

[5]  Benjamin K. Tsou,et al.  Enhancement of a Chinese Discourse Marker Tagger with C4.5 , 2000, ACL 2000.

[6]  Richard Nock,et al.  A hybrid filter/wrapper approach of feature selection using information theory , 2002, Pattern Recognit..

[7]  Novruz Allahverdi,et al.  Design of a hybrid system for the diabetes and heart diseases , 2008, Expert Syst. Appl..

[8]  Vincent Corruble,et al.  Predicting recovery in patients suffering from traumatic brain injury by using admission variables and physiological data: a comparison between decision tree analysis and logistic regression. , 2002, Journal of neurosurgery.

[9]  Simon Parsons,et al.  Principles of Data Mining by David J. Hand, Heikki Mannila and Padhraic Smyth, MIT Press, 546 pp., £34.50, ISBN 0-262-08290-X , 2004, The Knowledge Engineering Review.

[10]  J. Ross Quinlan,et al.  Improved Use of Continuous Attributes in C4.5 , 1996, J. Artif. Intell. Res..

[11]  Kemal Polat,et al.  Automatic detection of heart disease using an artificial immune recognition system (AIRS) with fuzzy resource allocation mechanism and k , 2007, Expert Syst. Appl..

[12]  Dilip Kumar Pratihar,et al.  Diagnosis of the diseases--using a GA-fuzzy approach , 2004, Inf. Sci..

[13]  Kemal Polat,et al.  Breast cancer diagnosis using least square support vector machine , 2007, Digit. Signal Process..

[14]  Vir V. Phoha,et al.  K-Means+ID3: A Novel Method for Supervised Anomaly Detection by Cascading K-Means Clustering and ID3 Decision Tree Learning Methods , 2007, IEEE Transactions on Knowledge and Data Engineering.

[15]  Kemal Polat,et al.  A hybrid approach to medical decision support systems: Combining feature selection, fuzzy weighted pre-processing and AIRS , 2007, Comput. Methods Programs Biomed..

[16]  Yingtao Jiang,et al.  Development of a decision support system for heart disease diagnosis using multilayer perceptron , 2003, Proceedings of the 2003 International Symposium on Circuits and Systems, 2003. ISCAS '03..

[17]  Dilip Kumar Pratihar,et al.  Developing fuzzy classifiers to predict the chance of occurrence of adult psychoses , 2008, Knowl. Based Syst..

[18]  Seral Özsen,et al.  Attribute weighting via genetic algorithms for attribute weighted artificial immune system (AWAIS) and its application to heart disease and liver disorders problems , 2009, Expert Syst. Appl..

[19]  Abdulkadir Sengür,et al.  Effective diagnosis of heart disease through neural networks ensembles , 2009, Expert Syst. Appl..

[20]  Dursun Delen,et al.  Predicting breast cancer survivability: a comparison of three data mining methods , 2005, Artif. Intell. Medicine.

[21]  Joachim M. Buhmann,et al.  Stability-Based Validation of Clustering Solutions , 2004, Neural Computation.

[22]  Inderjit S. Dhillon,et al.  A Divisive Information-Theoretic Feature Clustering Algorithm for Text Classification , 2003, J. Mach. Learn. Res..

[23]  Kemal Polat,et al.  A new feature selection method on classification of medical datasets: Kernel F-score feature selection , 2009, Expert Syst. Appl..

[24]  Theodore Kalamboukis,et al.  Using clustering to enhance text classification , 2007, SIGIR.

[25]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[26]  Yanqing Zhang,et al.  Granular support vector machines for medical binary classification problems , 2004, 2004 Symposium on Computational Intelligence in Bioinformatics and Computational Biology.

[27]  H T Lynch,et al.  Automated detection of hereditary syndromes using data mining. , 1997, Computers and biomedical research, an international journal.

[28]  Mevlut Ture,et al.  Using Kaplan-Meier analysis together with decision tree methods (C&RT, CHAID, QUEST, C4.5 and ID3) in determining recurrence-free survival of breast cancer patients , 2009, Expert Syst. Appl..

[29]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[30]  Moshe Sipper,et al.  A fuzzy-genetic approach to breast cancer diagnosis , 1999, Artif. Intell. Medicine.

[31]  Durga Toshniwal,et al.  A New Approach: Role of Data Mining in Prediction of Survival of Burn Patients , 2011, Journal of Medical Systems.

[32]  Diana Dumitru,et al.  Prediction of recurrent events in breast cancer using the Naive Bayesian classification , 2009 .

[33]  Kemal Polat,et al.  Breast cancer and liver disorders classification using artificial immune recognition system (AIRS) with performance evaluation by fuzzy resource allocation mechanism , 2007, Expert Syst. Appl..

[34]  Li Maokuan,et al.  Unlabeled data classification via support vector machines and k-means clustering , 2004, Proceedings. International Conference on Computer Graphics, Imaging and Visualization, 2004. CGIV 2004..

[35]  Ramasamy Uthurusamy,et al.  Data mining and knowledge discovery in databases , 1996, CACM.