Efficient handling of high-dimensional feature spaces by randomized classifier ensembles

Handling massive datasets is a difficult problem not only due to prohibitively large numbers of entries but in some cases also due to the very high dimensionality of the data. Often, severe feature selection is performed to limit the number of attributes to a manageable size, which unfortunately can lead to a loss of useful information. Feature space reduction may well be necessary for many stand-alone classifiers, but recent advances in the area of ensemble classifier techniques indicate that overall accurate classifier aggregates can be learned even if each individual classifier operates on incomplete "feature view" training data, i.e., such where certain input attributes are excluded. In fact, by using only small random subsets of features to build individual component classifiers, surprisingly accurate and robust models can be created. In this work we demonstrate how these types of architectures effectively reduce the feature space for submodels and groups of sub-models, which lends itself to efficient sequential and/or parallel implementations. Experiments with a randomized version of Adaboost are used to support our arguments, using the text classification task as an example.

[1]  W. W. Bledsoe,et al.  Pattern recognition and reading by machine , 1959, IRE-AIEE-ACM '59 (Eastern).

[2]  T. J. Stonham,et al.  Guide to pattern recognition using random-access memories , 1979 .

[3]  I. Aleksander,et al.  WISARD·a radical step forward in image recognition , 1984 .

[4]  Sholom M. Weiss,et al.  Automated learning of decision rules for text categorization , 1994, TOIS.

[5]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[6]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[7]  Michal Morciniec,et al.  The Theoretical and Experimental Status of the n-tuple Classifier , 1998, Neural Networks.

[8]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[9]  L. Breiman Arcing Classifiers , 1998 .

[10]  Yiming Yang,et al.  A re-examination of text categorization methods , 1999, SIGIR '99.

[11]  Osamu Watanabe,et al.  Scaling Up a Boosting-Based Learner via Adaptive Sampling , 2000, PAKDD.

[12]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[13]  Jianchang Mao,et al.  Scaling-up support vector machines using boosting algorithm , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[14]  David B. Skillicorn,et al.  Parallelizing Boosting and Bagging , 2001 .

[15]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[16]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[17]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[18]  Yoram Singer,et al.  BoosTexter: A Boosting-based System for Text Categorization , 2000, Machine Learning.