Mondrian Forests for Large-Scale Regression when Uncertainty Matters

Many real-world regression problems demand a measure of the uncertainty associated with each prediction. Standard decision forests deliver efficient state-of-the-art predictive performance, but high-quality uncertainty estimates are lacking. Gaussian processes (GPs) deliver uncertainty estimates, but scaling GPs to large-scale data sets comes at the cost of approximating the uncertainty estimates. We extend Mondrian forests, first proposed by Lakshminarayanan et al. (2014) for classification problems, to the large-scale non-parametric regression setting. Using a novel hierarchical Gaussian prior that dovetails with the Mondrian forest framework, we obtain principled uncertainty estimates, while still retaining the computational advantages of decision forests. Through a combination of illustrative examples, real-world large-scale datasets, and Bayesian optimization benchmarks, we demonstrate that Mondrian forests outperform approximate GPs on large-scale regression tasks and deliver better-calibrated uncertainty assessments than decision-forest-based methods.

[1]  Stephen E. Fienberg,et al.  The Comparison and Evaluation of Forecasters. , 1983 .

[2]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[3]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[4]  Carl E. Rasmussen,et al.  Evaluating Predictive Uncertainty Challenge , 2005, MLCW.

[5]  Pierre Geurts,et al.  Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..

[6]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[7]  Nicolai Meinshausen,et al.  Quantile Regression Forests , 2006, J. Mach. Learn. Res..

[8]  Rich Caruana,et al.  An empirical comparison of supervised learning algorithms , 2006, ICML.

[9]  Yee Whye Teh,et al.  The Mondrian Process , 2008, NIPS.

[10]  Frank Hutter,et al.  Automated configuration of algorithms for solving hard computational problems , 2009 .

[11]  Michalis K. Titsias,et al.  Variational Learning of Inducing Variables in Sparse Gaussian Processes , 2009, AISTATS.

[12]  Andreas Krause,et al.  Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.

[13]  Nando de Freitas,et al.  A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning , 2010, ArXiv.

[14]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[15]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[16]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[17]  Antonio Criminisi,et al.  Decision Forests: A Unified Framework for Classification, Regression, Density Estimation, Manifold Learning and Semi-Supervised Learning , 2012, Found. Trends Comput. Graph. Vis..

[18]  Katharina Eggensperger,et al.  Towards an Empirical Foundation for Assessing Bayesian Optimization of Hyperparameters , 2013 .

[19]  Neil D. Lawrence,et al.  Gaussian Processes for Big Data , 2013, UAI.

[20]  Yee Whye Teh,et al.  Mondrian Forests: Efficient Online Random Forests , 2014, NIPS.

[21]  Kevin Leyton-Brown,et al.  Algorithm runtime prediction: Methods & evaluation , 2012, Artif. Intell..

[22]  Steven Reece,et al.  Automated Machine Learning on Big Data using Stochastic Algorithm Tuning , 2014 .

[23]  Carl E. Rasmussen,et al.  Distributed Variational Inference in Sparse Gaussian Process Regression and Latent Variable Models , 2014, NIPS.

[24]  Pushmeet Kohli,et al.  Just-In-Time Learning for Fast and Flexible Inference , 2014, NIPS.

[25]  H. Hoos,et al.  Algorithm Runtime Prediction : , 2015 .

[26]  Arthur Gretton,et al.  Kernel-Based Just-In-Time Learning for Passing Expectation Propagation Messages , 2015, UAI.

[27]  Prabhat,et al.  Scalable Bayesian Optimization Using Deep Neural Networks , 2015, ICML.

[28]  Marc Peter Deisenroth,et al.  Distributed Gaussian Processes , 2015, ICML.