Improved classification of medical data using abductive network committees trained on different feature subsets

This paper demonstrates the use of abductive network classifier committees trained on different features for improving classification accuracy in medical diagnosis. In an earlier publication, committee members were trained on different subsets of the training set to ensure enough diversity for improved committee performance. In situations characterized by high data dimensionality, i.e. a large number of features and a relatively few training examples, it may be more advantageous to split the feature set rather than the training set. We describe a novel approach for tentatively ranking the features and forming subsets of uniform predictive quality for training individual members. The abductive network training algorithm is used to select optimum predictors from the feature set at various levels of model complexity specified by the user. Using the resulting tentative ranking, the features are grouped into mutually exclusive subsets of approximately equal predictive power for training the members. The approach is demonstrated on three standard medical diagnosis datasets (breast cancer, heart disease, and diabetes). Three-member committees trained on different feature subsets and using simple output combination methods reduce classification errors by up to 20% compared to the best single model developed with the full feature set. Results are compared with those reported previously with members trained through splitting the training set. Training abductive committee members on feature subsets of approximately equal predictive power achieves both diversity and quality for improved committee performance. Ensemble feature subset selection can be performed using GMDH-based learning algorithms. The approach should be advantageous in situations characterized by high data dimensionality.

[1]  N. P. Reddy,et al.  Toward intelligent Web monitoring: performance of committee neural networks vs. single neural network , 2000, Proceedings 2000 IEEE EMBS International Conference on Information Technology Applications in Biomedicine. ITAB-ITIS 2000. Joint Meeting Third IEEE EMBS International Conference on Information Technol.

[2]  David W. Opitz,et al.  An empirical evaluation of bagging and boosting for artificial neural networks , 1997, Proceedings of International Conference on Neural Networks (ICNN'97).

[3]  Kristin P. Bennett,et al.  Feature selection for in-silico drug design using genetic algorithms and neural networks , 2001, SMCia/01. Proceedings of the 2001 IEEE Mountain Workshop on Soft Computing in Industrial Applications (Cat. No.01EX504).

[4]  Byoung-Tak Zhang,et al.  Combining locally trained neural networks by introducing a reject class , 1999, IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339).

[5]  Keith C. Drake,et al.  Abductive networks , 1990, Defense, Security, and Sensing.

[6]  R. Abdel-Aal,et al.  Modeling obesity using abductive networks. , 1997, Computers and biomedical research, an international journal.

[7]  Alexey Tsymbal,et al.  Ensemble feature selection with the simple Bayesian classification in medical diagnostics , 2002, Proceedings of 15th IEEE Symposium on Computer-Based Medical Systems (CBMS 2002).

[8]  Paul Scheunders,et al.  Genetic feature selection combined with composite fuzzy nearest neighbor classifiers for high-dimensional remote sensing data , 2000, Smc 2000 conference proceedings. 2000 ieee international conference on systems, man and cybernetics. 'cybernetics evolving to systems, humans, organizations, and their complex interactions' (cat. no.0.

[9]  O. Mangasarian,et al.  Multisurface method of pattern separation for medical diagnosis applied to breast cytology. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Matthew A. Kupinski,et al.  Feature selection and classifiers for the computerized detection of mass lesions in digital mammography , 1997, Proceedings of International Conference on Neural Networks (ICNN'97).

[11]  Mark A. Hall,et al.  Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning , 1999, ICML.

[12]  Steven Guan,et al.  Feature selection for modular GA-based classification , 2004, Appl. Soft Comput..

[13]  Sung-Bae Cho,et al.  Classifying gene expression data of cancer using classifier ensemble with mutually exclusive features , 2002, Proc. IEEE.

[14]  J. Echauz,et al.  Neural network detection of antiepileptic drugs from a single EEG trace , 1994, Proceedings of ELECTRO '94.

[15]  R. Detrano,et al.  International application of a new probability algorithm for the diagnosis of coronary artery disease. , 1989, The American journal of cardiology.

[16]  Waleed H. Abdulla,et al.  Reduced feature-set based parallel CHMM speech recognition systems , 2003, Inf. Sci..

[17]  J. I. Sewell International symposium on circuits and systems: April 27–29 1976. Technical University, Munich, F. R., Germany , 1976 .

[18]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[19]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  Mostefa Mesbah,et al.  An optimal feature set for seizure detection systems for newborn EEG signals , 2003, Proceedings of the 2003 International Symposium on Circuits and Systems, 2003. ISCAS '03..

[21]  Anders Krogh,et al.  Neural Network Ensembles, Cross Validation, and Active Learning , 1994, NIPS.

[22]  Gert Pfurtscheller,et al.  Automatic differentiation of multichannel EEG signals , 2001, IEEE Transactions on Biomedical Engineering.

[23]  Richard S. Johannes,et al.  Using the ADAP Learning Algorithm to Forecast the Onset of Diabetes Mellitus , 1988 .

[24]  Aboul Ella Hassanien,et al.  Rough Set Approach for Generation of Classification Rules of Breast Cancer Data , 2004, Informatica.

[25]  Seppo Puuronen,et al.  Selection of voice features to diagnose hearing impairments of children , 2001, Proceedings 14th IEEE Symposium on Computer-Based Medical Systems. CBMS 2001.

[26]  Walter J. Rawls,et al.  Accuracy and reliability of pedotransfer functions as affected by grouping soils , 1999 .

[27]  R. Abdel-Aal,et al.  Abductive Machine Learning for Modeling and Predicting the Educational Score in School Health Surveys , 1996, Methods of Information in Medicine.

[28]  Noel E. Sharkey,et al.  Adapting an Ensemble Approach for the Diagnosis of Breast Cancer , 1998 .

[29]  A.J. Hoffman,et al.  Seismic buffer recognition using mutual information for selecting wavelet based features , 1998, IEEE International Symposium on Industrial Electronics. Proceedings. ISIE'98 (Cat. No.98TH8357).

[30]  Remo Guidieri Res , 1995, RES: Anthropology and Aesthetics.

[31]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[32]  Yu-Bin Yang,et al.  Lung cancer cell identification based on artificial neural network ensembles , 2002, Artif. Intell. Medicine.

[33]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[34]  R E Abdel-Aal,et al.  Abductive Network Committees for Improved Classification of Medical Data , 2004, Methods of Information in Medicine.

[35]  C.W. Anderson,et al.  Comparison of linear, nonlinear, and feature selection methods for EEG signal classification , 2003, IEEE Transactions on Neural Systems and Rehabilitation Engineering.

[36]  Padraig Cunningham,et al.  Diversity versus Quality in Classification Ensembles Based on Feature Selection , 2000, ECML.

[37]  H. K. Huang,et al.  Feature selection in the pattern classification problem of digital chest radiograph segmentation , 1995, IEEE Trans. Medical Imaging.

[38]  Wlodzislaw Duch,et al.  A new methodology of extraction, optimization and application of crisp and fuzzy logical rules , 2001, IEEE Trans. Neural Networks.

[39]  Jacek M. Zurada,et al.  GMDH-type neural networks and their application to the medical image recognition of the lungs , 1999, SICE '99. Proceedings of the 38th SICE Annual Conference. International Session Papers (IEEE Cat. No.99TH8456).

[40]  Nigel M. Allinson,et al.  Fast committee learning: preliminary results , 1998 .

[41]  David W. Opitz,et al.  Feature Selection for Ensembles , 1999, AAAI/IAAI.

[42]  Alan F. Murray,et al.  IEEE International Conference on Neural Networks , 1997 .

[43]  Stanley J. Farlow,et al.  Self-Organizing Methods in Modeling: Gmdh Type Algorithms , 1984 .

[44]  Andrzej Skowron,et al.  Rough set methods in feature selection and recognition , 2003, Pattern Recognit. Lett..