UvA-DARE (Digital Academic Repository) Codemaps Segment, Classify and Search Objects Locally

In this paper we aim for segmentation and classification of objects. We propose codemaps that are a joint formu-lation of the classification score and the local neighborhood it belongs to in the image. We obtain the codemap by reordering the encoding, pooling and classification steps over lattice elements. Other than existing linear decompositions who emphasize only the efficiency benefits for localized search, we make three novel contributions. As a preliminary, we provide a theoretical generalization of the sufficient mathematical conditions under which image encodings and classification becomes locally decomposable. As first novelty we introduce ℓ 2 normalization for arbitrarily shaped image regions, which is fast enough for semantic segmentation using our Fisher codemaps. Second, using the same lattice across images, we propose kernel pooling which embeds nonlinearities into codemaps for object classification by explicit or approximate feature mappings. Results demonstrate that ℓ 2 normalized Fisher codemaps improve the state-of-the-art in semantic segmentation for PASCAL VOC. For object classification the addition of nonlinearities brings us on par with the state-of-the-art, but is 3x faster. Because of the codemaps’ inherent efficiency, we can reach significant speed-ups for localized search as well. We exploit the efficiency gain for our third novelty: object segment retrieval using a single query image only.

[1]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[2]  Gregory Shakhnarovich,et al.  Discriminative Re-ranking of Diverse Segmentations , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[4]  Cristian Sminchisescu,et al.  Semantic Segmentation with Second-Order Pooling , 2012, ECCV.

[5]  Cordelia Schmid,et al.  Aggregating Local Image Descriptors into Compact Codes , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Joachim M. Buhmann,et al.  Weakly supervised structured output learning for semantic segmentation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Jitendra Malik,et al.  Semantic segmentation using regions and parts , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Kristen Grauman,et al.  Efficient region search for object detection , 2011, CVPR 2011.

[9]  Arnold W. M. Smeulders,et al.  The Visual Extent of an Object , 2011, International Journal of Computer Vision.

[10]  Gabriela Csurka,et al.  An Efficient Approach to Semantic Segmentation , 2011, International Journal of Computer Vision.

[11]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[12]  Pushmeet Kohli,et al.  Graph Cut Based Inference with Co-occurrence Statistics , 2010, ECCV.

[13]  Koen E. A. van de Sande,et al.  Evaluating Color Descriptors for Object and Scene Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Jean Ponce,et al.  A Theoretical Analysis of Feature Pooling in Visual Recognition , 2010, ICML.

[15]  Andrew Zisserman,et al.  Efficient additive kernels via explicit feature maps , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[16]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[17]  Cristian Sminchisescu,et al.  Efficient Match Kernel between Sets of Features for Visual Recognition , 2009, NIPS.

[18]  Christoph H. Lampert,et al.  Efficient Subwindow Search: A Branch and Bound Framework for Object Localization , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Sven J. Dickinson,et al.  TurboPixels: Fast Superpixels Using Geometric Flows , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Cordelia Schmid,et al.  Combining efficient object localization and image classification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[21]  Subhransu Maji,et al.  Max-margin additive classifiers for detection , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[22]  Jitendra Malik,et al.  From contours to regions: An empirical evaluation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Trevor Darrell,et al.  The Pyramid Match Kernel: Efficient Learning with Sets of Features , 2007, J. Mach. Learn. Res..

[26]  Paul Over,et al.  Evaluation campaigns and TRECVid , 2006, MIR '06.

[27]  Cordelia Schmid,et al.  Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[28]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[29]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[30]  Cordelia Schmid,et al.  Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study , 2006, CVPR Workshops.

[31]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[32]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[33]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[34]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[35]  Jitendra Malik,et al.  Learning a classification model for segmentation , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[36]  Jitendra Malik,et al.  Blobworld: Image Segmentation Using Expectation-Maximization and Its Application to Image Querying , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[37]  Alexander J. Smola,et al.  Learning with kernels , 1998 .