Scalable Gaussian Process Structured Prediction for Grid Factor Graph Applications

Structured prediction is an important and well-studied problem with many applications across machine learning. GPstruct is a recently proposed structured prediction model that offers appealing properties such as being kernelised, non-parametric, and supporting Bayesian inference (Bratieres et al., 2013). The model places a Gaussian process prior over energy functions which describe relationships between input variables and structured output variables. However, the memory demand of GPstruct is quadratic in the number of latent variables and training runtime scales cubically. This prevents GPstruct from being applied to problems involving grid factor graphs, which are prevalent in computer vision and spatial statistics applications. Here we explore a scalable approach to learning GPstruct models based on ensemble learning, with weak learners (predictors) trained on subsets of the latent variables and bootstrap data, which can easily be distributed. We show experiments with 4M latent variables on image segmentation. Our method outperforms widely-used conditional random field models trained with pseudo-likelihood. Moreover, in image segmentation problems it improves over recent state-of-the-art marginal optimisation methods in terms of predictive performance and uncertainty calibration. Finally, it generalises well on all training set sizes.

[1]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[2]  Tadayoshi Fushiki,et al.  Nonparametric bootstrap prediction , 2005 .

[3]  Joachim Denzler,et al.  A Fast Approach for Pixelwise Labeling of Facade Images , 2010, 2010 20th International Conference on Pattern Recognition.

[4]  Zoubin Ghahramani,et al.  MCMC for Doubly-intractable Distributions , 2006, UAI.

[5]  Zoubin Ghahramani,et al.  Sparse Gaussian Processes using Pseudo-inputs , 2005, NIPS.

[6]  Tomaso Poggio,et al.  Probabilistic Solution of Ill-Posed Problems in Computational Vision , 1987 .

[7]  Max Welling,et al.  Learning in Markov Random Fields An Empirical Study , 2005 .

[8]  Carl E. Rasmussen,et al.  A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[9]  Andrew McCallum,et al.  An Introduction to Conditional Random Fields , 2010, Found. Trends Mach. Learn..

[10]  Neil D. Lawrence,et al.  Gaussian Processes for Big Data , 2013, UAI.

[11]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[12]  Francis R. Bach,et al.  Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning , 2008, NIPS.

[13]  Ryan P. Adams,et al.  Elliptical slice sampling , 2009, AISTATS.

[14]  Pushmeet Kohli,et al.  Markov Random Fields for Vision and Image Processing , 2011 .

[15]  Volker Tresp,et al.  A Bayesian Committee Machine , 2000, Neural Computation.

[16]  Justin Domke,et al.  Learning Graphical Model Parameters with Approximate Marginal Inference , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  J. Besag Statistical Analysis of Non-Lattice Data , 1975 .

[18]  Andrew Gordon Wilson,et al.  Gaussian Process Kernels for Pattern Discovery and Extrapolation , 2013, ICML.

[19]  Yee Whye Teh,et al.  Hybrid Variational/Gibbs Collapsed Inference in Topic Models , 2008, UAI.

[20]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[21]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[22]  Zoubin Ghahramani,et al.  Bayesian Learning in Undirected Graphical Models: Approximate MCMC Algorithms , 2004, UAI.

[23]  Xiaojin Zhu,et al.  Kernel conditional random fields: representation and clique selection , 2004, ICML.

[24]  Zoubin Ghahramani,et al.  Bayesian Structured Prediction Using Gaussian Processes , 2013, ArXiv.

[25]  Sebastian Nowozin,et al.  Decision tree fields , 2011, 2011 International Conference on Computer Vision.

[26]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[27]  RasmussenCarl Edward,et al.  A Unifying View of Sparse Approximate Gaussian Process Regression , 2005 .

[28]  Stephen Gould,et al.  Decomposing a scene into geometric and semantically consistent regions , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[29]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[30]  Joshua B. Tenenbaum,et al.  Structure Discovery in Nonparametric Regression through Compositional Kernel Search , 2013, ICML.

[31]  Lei Wang,et al.  MRF parameter estimation by MCMC method , 2000, Pattern Recognit..