Object and Action Classification with Latent Variables

In this paper we propose a generic framework to incorporate unobserved auxiliary information for classifying objects and actions. This framework allows us to explicitly account for localisation and alignment of representations for generic object and action classes as latent variables. We approach this problem in the discriminative setting as learning a max-margin classifier that infers the class label along with the latent variables. Through this paper we make the following contributions a) We provide a method for incorporating latent variables into object and action classification b) We specifically account for the presence of an explicit class related subregion which can include foreground and/or background. c) We explore a way to learn a better classifier by iterative expansion of the latent parameter space. We demonstrate the performance of our approach by rigorous experimental evaluation on a number of standard object and action recognition datasets.

[1]  Thomas Hofmann,et al.  Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[2]  Andrew Zisserman,et al.  Structured output regression for detection with partial truncation , 2009, NIPS.

[3]  Sebastian Nowozin,et al.  On feature combination for multiclass object classification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[4]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[5]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[6]  Thorsten Joachims,et al.  Learning structural SVMs with latent variables , 2009, ICML '09.

[7]  Daphne Koller,et al.  Self-Paced Learning for Latent Variable Models , 2010, NIPS.

[8]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[9]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[10]  Christopher Joseph Pal,et al.  Activity recognition using the velocity histories of tracked keypoints , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[11]  Andrew Zisserman,et al.  Multiple kernels for object detection , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[12]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Jean Ponce,et al.  Learning mid-level features for recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[14]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[15]  Martial Hebert,et al.  Modeling the Temporal Extent of Actions , 2010, ECCV.

[16]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[17]  Cordelia Schmid,et al.  Evaluation of Local Spatio-temporal Features for Action Recognition , 2009, BMVC.

[18]  Matthew B. Blaschko,et al.  Simultaneous Object Detection and Ranking with Weak Supervision , 2010, NIPS.

[19]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[20]  Luc Van Gool,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[21]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.