An ensemble of classifiers approach for the missing feature problem

A new learning algorithm is introduced that can accommodate data with missing features. The algorithm uses an ensemble of classifiers approach. The classifiers in the ensemble are trained with random subsets of the total number of available features. The approach takes advantage of the basic assumption that an unknown subset of the features is in fact adequate for the classification, or in other words, that are redundant, and possibly irrelevant features in the data. This assumption is in general true for most practical applications. We empirically show that if a certain number of networks produce a particular classification performance using all of the features, then the same classification performance can be reached even if some features are missing, as long as the same number of usable networks can be generated with the missing features. The proposed approach has its roots in the incremental learning algorithm, Learn/sup ++/ which seeks to learn new information that is provided by additional datasets that may later become available, even when such data introduce new classes. We have modified the Learn/sup ++/ algorithm for addressing the missing feature problem. The proposed algorithm showed surprisingly remarkable performance on three real-world applications, with up to 10% of the features missing in the validation / field data.

[1]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[2]  Robi Polikar,et al.  Learn++: a classifier independent incremental learning algorithm for supervised neural networks , 2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290).

[3]  Phil D. Green,et al.  Some solution to the missing feature problem in data classification, with application to noise robust ASR , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[4]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[5]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[6]  Ludmila I. Kuncheva,et al.  A Theoretical Study on Six Classifier Fusion Strategies , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Satish S. Udpa,et al.  Learn++: An in-cremental learning algorithm for multilayer neural networks , 2000 .

[8]  Vasant Honavar,et al.  Learn++: an incremental learning algorithm for supervised neural networks , 2001, IEEE Trans. Syst. Man Cybern. Part C.

[9]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[10]  Lalita Udpa,et al.  Artificial intelligence methods for selection of an optimized sensor array for identification of volatile organic compounds , 2001 .

[11]  Manfred K. Warmuth,et al.  The Weighted Majority Algorithm , 1994, Inf. Comput..

[12]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[13]  Volker Tresp,et al.  Efficient Methods for Dealing with Missing Data in Supervised Learning , 1994, NIPS.

[14]  Thomas G. Dietterich,et al.  Editors. Advances in Neural Information Processing Systems , 2002 .

[15]  Volker Tresp,et al.  Training Neural Networks with Deficient Data , 1993, NIPS.

[16]  Satish S. Udpa,et al.  LEARN++: an incremental learning algorithm for multilayer perceptron networks , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[17]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  Volker Tresp,et al.  Some Solutions to the Missing Feature Problem in Vision , 1992, NIPS.

[19]  Lars Kai Hansen,et al.  Neural Network Ensembles , 1990, IEEE Trans. Pattern Anal. Mach. Intell..