Contextualizing Object Detection and Classification

We investigate how to iteratively and mutually boost object classification and detection performance by taking the outputs from one task as the context of the other one. While context models have been quite popular, previous works mainly concentrate on co-occurrence relationship within classes and few of them focus on contextualization from a top-down perspective, i.e. high-level task context. In this paper, our system adopts a new method for adaptive context modeling and iterative boosting. First, the contextualized support vector machine (Context-SVM) is proposed, where the context takes the role of dynamically adjusting the classification score based on the sample ambiguity, and thus the context-adaptive classifier is achieved. Then, an iterative training procedure is presented. In each step, Context-SVM, associated with the output context from one task (object classification or detection), is instantiated to boost the performance for the other task, whose augmented outputs are then further used to improve the former task by Context-SVM. The proposed solution is evaluated on the object classification and detection tasks of PASCAL Visual Object Classes Challenge (VOC) 2007, 2010 and SUN09 data sets, and achieves the state-of-the-art performance.

[1]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[2]  Hayit Greenspan,et al.  Finding Pictures of Objects in Large Collections of Images , 1996, Object Representation in Computer Vision.

[3]  Matti Pietikäinen,et al.  A comparative study of texture measures with classification based on featured distributions , 1996, Pattern Recognit..

[4]  Bernhard Schölkopf,et al.  A Generalized Representer Theorem , 2001, COLT/EuroCOLT.

[5]  Nando de Freitas,et al.  A Statistical Model for General Contextual Object Recognition , 2004, ECCV.

[6]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[7]  Antonio Torralba,et al.  Contextual Priming for Object Detection , 2003, International Journal of Computer Vision.

[8]  Christopher J. C. Burges,et al.  A Tutorial on Support Vector Machines for Pattern Recognition , 1998, Data Mining and Knowledge Discovery.

[9]  Martial Hebert,et al.  A hierarchical field framework for unified context-based classification , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[10]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[11]  Lior Wolf,et al.  A Critical View of Context , 2006, International Journal of Computer Vision.

[12]  David A. Forsyth,et al.  Animals on the Web , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[13]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[14]  Daphna Weinshall,et al.  Exploiting Object Hierarchy: Combining Models from Different Category Levels , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[15]  Cordelia Schmid,et al.  Learning Object Representations for Visual Object Class Recognition , 2007, ICCV 2007.

[16]  Zhuowen Tu,et al.  Auto-context and its application to high-level vision tasks , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Roberto Cipolla,et al.  Semantic texton forests for image categorization and segmentation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Alexei A. Efros,et al.  An empirical study of context in object detection , 2009, CVPR.

[19]  Cordelia Schmid,et al.  Combining efficient object localization and image classification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[20]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[21]  Charless C. Fowlkes,et al.  Discriminative models for multi-class object layout , 2009, ICCV.

[22]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, CVPR.

[23]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  William T. Freeman,et al.  Latent hierarchical structural learning for object detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[25]  Serge J. Belongie,et al.  Context based object categorization: A critical survey , 2010, Comput. Vis. Image Underst..

[26]  Koen E. A. van de Sande,et al.  Evaluating Color Descriptors for Object and Scene Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[28]  Thomas S. Huang,et al.  Image Classification Using Super-Vector Coding of Local Image Descriptors , 2010, ECCV.

[29]  Antonio Torralba,et al.  Exploiting hierarchical context on a large database of object categories , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[30]  Fei-Fei Li,et al.  Modeling mutual context of object and human pose in human-object interaction activities , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[31]  Cordelia Schmid,et al.  Multimodal semi-supervised learning for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[32]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[33]  Fei-Fei Li,et al.  Hierarchical semantic indexing for large scale image retrieval , 2011, CVPR 2011.

[34]  Cordelia Schmid,et al.  Towards good practice in large-scale learning for image classification , 2012, CVPR.

[35]  Antonio Torralba,et al.  A Tree-Based Context Model for Object Recognition , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Ankur Datta,et al.  Efficient Maximum Appearance Search for Large-Scale Object Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Qiang Chen,et al.  Contextualizing Object Detection and Classification , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.