Using Random Forests for Handwritten Digit Recognition

In the pattern recognition field, growing interest has been shown in recent years for multiple classifier systems and particularly for bagging, boosting and random sub-spaces. Those methods aim at inducing an ensemble of classifiers by producing diversity at different levels. Following this principle, Breiman has introduced in 2001 another family of methods called random forest. Our work aims at studying those methods in a strictly pragmatic approach, in order to provide rules on parameter settings for practitioners. For that purpose we have experimented the forest-RI algorithm, considered as the random forest reference method, on the MNIST handwritten digits database. In this paper, we describe random forest principles and review some methods proposed in the literature. We present next our experimental protocol and results. We finally draw some conclusions on random forest global behavior according to their parameter tuning.

[1]  Wei Zhong Liu,et al.  The Importance of Attribute Selection Measures in Decision Tree Induction , 1994, Machine Learning.

[2]  Marko Robnik-Sikonja,et al.  Improving Random Forests , 2004, ECML.

[3]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[4]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[5]  Gian Luca Foresti,et al.  Meta Random Forests , 2006 .

[6]  Y. Prudent,et al.  A K Nearest Classifier design , 2005 .

[7]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[8]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[9]  Juan José Rodríguez Diez,et al.  Rotation Forest: A New Classifier Ensemble Method , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[12]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[13]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[14]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[15]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.