Mondrian Forests: Efficient Online Random Forests

Ensembles of randomized decision trees, usually referred to as random forests, are widely used for classification and regression tasks in machine learning and statistics. Random forests achieve competitive predictive performance and are computationally efficient to train and test, making them excellent candidates for real-world prediction tasks. The most popular random forest variants (such as Breiman's random forest and extremely randomized trees) operate on batches of training data. Online methods are now in greater demand. Existing online random forests, however, require more training data than their batch counterpart to achieve comparable predictive performance. In this work, we use Mondrian processes (Roy and Teh, 2009) to construct ensembles of random decision trees we call Mondrian forests. Mondrian forests can be grown in an incremental/online fashion and remarkably, the distribution of online Mondrian forests is the same as that of batch Mondrian forests. Mondrian forests achieve competitive predictive performance comparable with existing online random forests and periodically retrained batch random forests, while being more than an order of magnitude faster, thus representing a better computation vs accuracy tradeoff.

[1]  Horst Bischof,et al.  On-line Random Forests , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[2]  Geoff Hulten,et al.  Mining high-speed data streams , 2000, KDD '00.

[3]  Robert B. Gramacy,et al.  Dynamic Trees for Learning and Design , 2009, 0912.1586.

[4]  Rich Caruana,et al.  An empirical comparison of supervised learning algorithms , 2006, ICML.

[5]  Yee Whye Teh,et al.  Top-down particle filtering for Bayesian decision trees , 2013, ICML.

[6]  Paul E. Utgoff,et al.  Incremental Induction of Decision Trees , 1989, Machine Learning.

[7]  Misha Denil,et al.  Consistency of Online Random Forests , 2013, ICML.

[8]  Antonio Criminisi,et al.  Decision Forests: A Unified Framework for Classification, Regression, Density Estimation, Manifold Learning and Semi-Supervised Learning , 2012, Found. Trends Comput. Graph. Vis..

[9]  Thomas P. Minka,et al.  Bayesian model averaging is not model combination , 2002 .

[10]  Yee Whye Teh,et al.  A stochastic memoizer for sequence data , 2009, ICML '09.

[11]  Daniel M. Roy Computability, inference and modeling in probabilistic programming , 2011 .

[12]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[13]  H. Chipman,et al.  Bayesian CART Model Search , 1998 .

[14]  Yee Whye Teh,et al.  The Mondrian Process , 2008, NIPS.

[15]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[16]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[17]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[18]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[19]  Yee Whye Teh,et al.  A Hierarchical Bayesian Language Model Based On Pitman-Yor Processes , 2006, ACL.

[20]  Stuart L. Crawford Extensions to the CART Algorithm , 1989, Int. J. Man Mach. Stud..

[21]  Raghu Ramakrishnan,et al.  Proceedings : KDD 2000 : the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 20-23, 2000, Boston, MA, USA , 2000 .

[22]  H. Chipman,et al.  BART: Bayesian Additive Regression Trees , 2008, 0806.3286.

[23]  Adrian F. M. Smith,et al.  A Bayesian CART algorithm , 1998 .

[24]  Joshua Goodman,et al.  A bit of progress in language modeling , 2001, Comput. Speech Lang..

[25]  Adele Cutler,et al.  PERT – Perfect Random Tree Ensembles , 2001 .