Fast algorithms for large scale conditional 3D prediction

The potential success of discriminative learning approaches to 3D reconstruction relies on the ability to efficiently train predictive algorithms using sufficiently many examples that are representative of the typical configurations encountered in the application domain. Recent research indicates that sparse conditional Bayesian mixture of experts (cMoE) models (e.g. BME (Sminchisescu et al., 2005)) are adequate modeling tools that not only provide contextual 3D predictions for problems like human pose reconstruction, but can also represent multiple interpretations that result from depth ambiguities or occlusion. However, training conditional predictors requires sophisticated double-loop algorithms that scale unfavorably with the input dimension and the training set size, thus limiting their usage to 10,000 examples of less, so far. In this paper we present large-scale algorithms, referred to as fBME, that combine forward feature selection and bound optimization in order to train probabilistic, BME models, with one order of magnitude more data (100,000 examples and up) and more than one order of magnitude faster. We present several large scale experiments, including monocular evaluation on the HumanEva dataset (Sigal and Black, 2006), demonstrating how the proposed methods overcome the scaling limitations of existing ones.

[1]  Cristian Sminchisescu,et al.  Discriminative density propagation for 3D human motion estimation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[2]  D. Hunter,et al.  Optimization Transfer Using Surrogate Objective Functions , 2000 .

[3]  Michael J. Black,et al.  Predicting 3D People from 2D Pictures , 2006, AMDO.

[4]  S. Lauritzen,et al.  The TM algorithm for maximising a conditional likelihood function , 2001 .

[5]  L. Davis,et al.  Background and foreground modeling using nonparametric kernel density estimation for visual surveillance , 2002, Proc. IEEE.

[6]  Trevor Darrell,et al.  Fast pose estimation with parameter-sensitive hashing , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[7]  Ronald Poppe,et al.  Evaluating Example-based Pose Estimation: Experiments on the HumanEva Sets , 2007 .

[8]  Michael J. Black,et al.  HumanEva: Synchronized Video and Motion Capture Dataset and Baseline Algorithm for Evaluation of Articulated Human Motion , 2010, International Journal of Computer Vision.

[9]  Lawrence Carin,et al.  Sparse multinomial logistic regression: fast algorithms and generalization bounds , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Rómer Rosales,et al.  Learning Body Pose via Specialized Maps , 2001, NIPS.

[11]  Cristian Sminchisescu,et al.  BM³E : Discriminative Density Propagation for Visual Tracking , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  D. Böhning Multinomial logistic regression algorithm , 1992 .

[13]  Michael E. Tipping,et al.  Fast Marginal Likelihood Maximisation for Sparse Bayesian Models , 2003, AISTATS.

[14]  Cristian Sminchisescu,et al.  Semi-supervised Hierarchical Models for 3D Human Pose Reconstruction , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[16]  Michael J. Black,et al.  HumanEva: Synchronized Video and Motion Capture Dataset for Evaluation of Articulated Human Motion , 2006 .

[17]  Ji Zhu,et al.  Kernel Logistic Regression and the Import Vector Machine , 2001, NIPS.

[18]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[19]  Pascal Vincent,et al.  Kernel Matching Pursuit , 2002, Machine Learning.

[20]  Andrew W. Fitzgibbon,et al.  The Joint Manifold Model for Semi-supervised Multi-valued Regression , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[21]  Alex Pentland,et al.  On Reversing Jensen's Inequality , 2000, NIPS.

[22]  Cristian Sminchisescu,et al.  Twin Gaussian Processes for Structured Prediction , 2010, International Journal of Computer Vision.

[23]  Neil D. Lawrence,et al.  Fast Forward Selection to Speed Up Sparse Gaussian Process Regression , 2003, AISTATS.

[24]  Ankur Agarwal,et al.  3D human pose from silhouettes by relevance vector regression , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[25]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..