Attribute bagging: improving accuracy of classifier ensembles by using random feature subsets

We present attribute bagging (AB), a technique for improving the accuracy and stability of classifier ensembles induced using random subsets of features. AB is a wrapper method that can be used with any learning algorithm. It establishes an appropriate attribute subset size and then randomly selects subsets of features, creating projections of the training set on which the ensemble classifiers are built. The induced classifiers are then used for voting. This article compares the performance of our AB method with bagging and other algorithms on a hand-pose recognition dataset. It is shown that AB gives consistently better results than bagging, both in accuracy and stability. The performance of ensemble voting in bagging and the AB method as a function of the attribute subset size and the number of voters for both weighted and unweighted voting is tested and discussed. We also demonstrate that ranking the attribute subsets by their classification accuracy and voting using only the best subsets further improves the resulting performance of the ensemble.

[1]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[2]  Thomas G. Dietterich Machine-Learning Research , 1997, AI Mag..

[3]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[4]  Geoffrey I. Webb,et al.  Integrating boosting and stochastic attribute selection committees for further improving the performance of decision tree learning , 1998, Proceedings Tenth IEEE International Conference on Tools with Artificial Intelligence (Cat. No.98CH36294).

[5]  Xindong Wu,et al.  RIEVL: Recursive Induction Learning in Hand Gesture Recognition , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Xindong Wu The HCV induction algorithm , 1993, CSC '93.

[7]  David G. Stork,et al.  Pattern Classification , 1973 .

[8]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[9]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[10]  Simon Kasif,et al.  A System for Induction of Oblique Decision Trees , 1994, J. Artif. Intell. Res..

[11]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[12]  Nikunj C. Oza,et al.  Decimated input ensembles for improved generalization , 1999, IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339).

[13]  P. Langley Selection of Relevant Features in Machine Learning , 1994 .

[14]  Geoffrey I. Webb,et al.  Stochastic Attribute Selection Committees , 1998, Australian Joint Conference on Artificial Intelligence.

[15]  J. Ross Quinlan,et al.  Bagging, Boosting, and C4.5 , 1996, AAAI/IAAI, Vol. 1.

[16]  Thomas G. Dietterich Machine-Learning Research Four Current Directions , 1997 .

[17]  Huan Liu,et al.  A Probabilistic Approach to Feature Selection - A Filter Solution , 1996, ICML.

[18]  Jihoon Yang,et al.  Feature Subset Selection Using a Genetic Algorithm , 1998, IEEE Intell. Syst..

[19]  Thomas G. Dietterich,et al.  Error-Correcting Output Coding Corrects Bias and Variance , 1995, ICML.

[20]  Peter Clark,et al.  The CN2 Induction Algorithm , 1989, Machine Learning.

[21]  Stephen D. Bay Combining Nearest Neighbor Classifiers Through Multiple Feature Subsets , 1998, ICML.

[22]  L. A. Smith,et al.  Feature Subset Selection: A Correlation Based Filter Approach , 1997, ICONIP.

[23]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.