Learning equivariant structured output SVM regressors

Equivariance and invariance are often desired properties of a computer vision system. However, currently available strategies generally rely on virtual sampling, leaving open the question of how many samples are necessary, on the use of invariant feature representations, which can mistakenly discard information relevant to the vision task, or on the use of latent variable models, which result in non-convex training and expensive inference at test time. We propose here a generalization of structured output SVM regressors that can incorporate equivariance and invariance into a convex training procedure, enabling the incorporation of large families of transformations, while maintaining optimality and tractability. Importantly, test time inference does not require the estimation of latent variables, resulting in highly efficient objective functions. This results in a natural formulation for treating equivariance and invariance that is easily implemented as an adaptation of off-the-shelf optimization software, obviating the need for ad hoc sampling strategies. Theoretical results relating to vicinal risk, and experiments on challenging aerial car and pedestrian detection tasks show the effectiveness of the proposed solution.

[1]  Ivan Laptev,et al.  Improvements of Object Detection Using Boosted Histograms , 2006, BMVC.

[2]  Thorsten Joachims,et al.  Learning structural SVMs with latent variables , 2009, ICML '09.

[3]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[4]  Matthias W. Seeger,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[5]  Gérard Bloch,et al.  Incorporating prior knowledge in support vector machines for classification: A review , 2008, Neurocomputing.

[6]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[7]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[8]  Dariu Gavrila,et al.  An Experimental Study on Pedestrian Classification , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Bernard Victorri,et al.  Transformation invariance in pattern recognition: Tangent distance and propagation , 2000 .

[10]  Alexander J. Smola,et al.  Invariances in Classification: an efficient SVM implementation , 2005 .

[11]  Thomas S. Huang,et al.  Image Classification Using Super-Vector Coding of Local Image Descriptors , 2010, ECCV.

[12]  Thomas Hofmann,et al.  Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[13]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Matthew B. Blaschko,et al.  Simultaneous Object Detection and Ranking with Weak Supervision , 2010, NIPS.

[15]  Vincent Lepetit,et al.  Keypoint recognition using randomized trees , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Samy Bengio,et al.  Invariances in kernel methods: From samples to objects , 2006, Pattern Recognit. Lett..

[17]  Thorsten Joachims,et al.  A support vector method for multivariate performance measures , 2005, ICML.

[18]  Thorsten Joachims,et al.  Cutting-plane training of structural SVMs , 2009, Machine Learning.

[19]  Hans Burkhardt,et al.  Invariant kernel functions for pattern analysis and machine learning , 2007, Machine Learning.

[20]  Subhransu Maji,et al.  Classification using intersection kernel support vector machines is efficient , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[22]  Cristian Sminchisescu,et al.  Random Fourier Approximations for Skewed Multiplicative Histogram Kernels , 2010, DAGM-Symposium.

[23]  Yann LeCun,et al.  Transformation Invariance in Pattern Recognition - Tangent Distance and Tangent Propagation , 2012, Neural Networks: Tricks of the Trade.

[24]  Andrew Zisserman,et al.  An Invariant Large Margin Nearest Neighbour Classifier , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[25]  Christoph H. Lampert,et al.  Learning to Localize Objects with Structured Output Regression , 2008, ECCV.

[26]  Hans Burkhardt,et al.  Learning Equivariant Functions with Matrix Valued Kernels , 2007, J. Mach. Learn. Res..

[27]  Daphne Koller,et al.  Learning Spatial Context: Using Stuff to Find Things , 2008, ECCV.

[28]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[29]  Jason Weston,et al.  Vicinal Risk Minimization , 2000, NIPS.

[30]  Christian Walder,et al.  Learning with Transformation Invariant Kernels , 2007, NIPS.

[31]  T. Poggio,et al.  Recognition and Structure from one 2D Model View: Observations on Prototypes, Object Classes and Symmetries , 1992 .

[32]  Alexander J. Smola,et al.  Convex Learning with Invariances , 2007, NIPS.

[33]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[34]  Thore Graepel,et al.  Large Margin Rank Boundaries for Ordinal Regression , 2000 .