An Experimental Study on Rotation Forest Ensembles

Rotation Forest is a recently proposed method for building classifier ensembles using independently trained decision trees. It was found to be more accurate than bagging, AdaBoost and Random Forest ensembles across a collection of benchmark data sets. This paper carries out a lesion study on Rotation Forest in order to find out which of the parameters and the randomization heuristics are responsible for the good performance. Contrary to common intuition, the features extracted through PCA gave the best results compared to those extracted through non-parametric discriminant analysis (NDA) or random projections. The only ensemble method whose accuracy was statistically indistinguishable from that of Rotation Forest was LogitBoost although it gave slightly inferior results on 20 out of the 32 benchmark data sets. It appeared that the main factor for the success of Rotation Forest is that the transformation matrix employed to calculate the (linear) extracted features is sparse.

[1]  J. Wade Davis,et al.  Statistical Pattern Recognition , 2003, Technometrics.

[2]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[3]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[4]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[5]  Kagan Tumer,et al.  Input decimated ensembles , 2003, Pattern Analysis & Applications.

[6]  Yoshua Bengio,et al.  Inference for the Generalization Error , 1999, Machine Learning.

[7]  T. Subba Rao,et al.  Classification, Parameter Estimation and State Estimation: An Engineering Approach Using MATLAB , 2004 .

[8]  Pavel Pudil,et al.  Introduction to Statistical Pattern Recognition , 2006 .

[9]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[10]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[11]  Carla E. Brodley,et al.  Random Projection for High Dimensional Data Clustering: A Cluster Ensemble Approach , 2003, ICML.

[12]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[13]  L. Breiman Arcing Classifiers , 1998 .

[14]  John W. Sammon,et al.  An Optimal Set of Discriminant Vectors , 1975, IEEE Transactions on Computers.

[15]  K. Fukunaga,et al.  Nonparametric Discriminant Analysis , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Josef Kittler,et al.  A new approach to feature selection based on the Karhunen-Loeve expansion , 1973, Pattern Recognit..

[17]  Robert P. W. Duin,et al.  Combining Feature Subsets in Feature Selection , 2005, Multiple Classifier Systems.

[18]  Juan José Rodríguez Diez,et al.  Rotation Forest: A New Classifier Ensemble Method , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Ian Witten,et al.  Data Mining , 2000 .

[20]  M. Bressan,et al.  Nonparametric discriminant analysis and nearest neighbor classification , 2003, Pattern Recognit. Lett..

[21]  Keinosuke Fukunaga,et al.  Application of the Karhunen-Loève Expansion to Feature Selection and Ordering , 1970, IEEE Trans. Computers.

[22]  L. Breiman Arcing classifier (with discussion and a rejoinder by the author) , 1998 .