Classification with Global, Local and Shared Features

We present a framework that jointly learns and then uses multiple image windows for improved classification. Apart from using the entire image content as context, class-specific windows are added, as well as windows that target class pairs. The location and extent of the windows are set automatically by handling the window parameters as latent variables. This framework makes the following contributions: a) the addition of localized information through the class-specific windows improves classification, b) windows introduced for the classification of class pairs further improve the results, c) the windows and classification parameters can be effectively learnt using a discriminative max-margin approach with latent variables, and d) the same framework is suited for multiple visual tasks such as classifying objects, scenes and actions. Experiments demonstrate the aforementioned claims.

[1]  Yoram Singer,et al.  Large margin hierarchical classification , 2004, ICML.

[2]  Andrew Zisserman,et al.  Multiple kernels for object detection , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[3]  Carsten Rother,et al.  Weakly supervised discriminative localization and classification: a joint learning process , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[4]  Ian D. Reid,et al.  High Five: Recognising human interactions in TV shows , 2010, BMVC.

[5]  Luc Van Gool,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[6]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[7]  Antonio Torralba,et al.  Semantic Label Sharing for Learning with Many Categories , 2010, ECCV.

[8]  Nicolas Le Roux,et al.  Ask the locals: Multi-way local pooling for image recognition , 2011, 2011 International Conference on Computer Vision.

[9]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[10]  Antonio Torralba,et al.  Recognizing indoor scenes , 2009, CVPR.

[11]  Alan L. Yuille,et al.  The Concave-Convex Procedure , 2003, Neural Computation.

[12]  Andrew W. Fitzgibbon,et al.  Efficient Object Category Recognition Using Classemes , 2010, ECCV.

[13]  Hao Su,et al.  Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification , 2010, NIPS.

[14]  Thomas Deselaers,et al.  ClassCut for Unsupervised Class Segmentation , 2010, ECCV.

[15]  Andrew Zisserman,et al.  A Visual Vocabulary for Flower Classification , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[16]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[17]  Sebastian Nowozin,et al.  On feature combination for multiclass object classification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[18]  Joshua B. Tenenbaum,et al.  Learning to share visual appearance for multiclass object detection , 2011, CVPR 2011.

[19]  Luc Van Gool,et al.  Object and Action Classification with Latent Variables , 2011, BMVC.

[20]  Cordelia Schmid,et al.  Semantic Hierarchies for Visual Object Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Ali Farhadi,et al.  Recognition using visual phrases , 2011, CVPR 2011.

[22]  Andrew Zisserman,et al.  Incremental learning of object detectors using a visual shape alphabet , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[23]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[24]  Fernando De la Torre,et al.  Joint segmentation and classification of human actions in video , 2011, CVPR 2011.

[25]  Thorsten Joachims,et al.  Learning structural SVMs with latent variables , 2009, ICML '09.

[26]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Svetlana Lazebnik,et al.  Scene recognition and weakly supervised object localization with deformable part-based models , 2011, 2011 International Conference on Computer Vision.

[28]  Thomas Hofmann,et al.  Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[29]  Christoph H. Lampert Maximum Margin Multi-Label Structured Prediction , 2011, NIPS.