Accurate Object Recognition with Shape Masks

In this paper we propose an object recognition approach that is based on shape masks—generalizations of segmentation masks. As shape masks carry information about the extent (outline) of objects, they provide a convenient tool to exploit the geometry of objects. We apply our ideas to two common object class recognition tasks—classification and localization. For classification, we extend the orderless bag-of-features image representation. In the proposed setup shape masks can be seen as weak geometrical constraints over bag-of-features. Those constraints can be used to reduce background clutter and help recognition. For localization, we propose a new recognition scheme based on high-dimensional hypothesis clustering. Shape masks allow to go beyond bounding boxes and determine the outline (approximate segmentation) of the object during localization. Furthermore, the method easily learns and detects possible object viewpoints and articulations, which are often well characterized by the object outline. Our experiments reveal that shape masks can improve recognition accuracy of state-of-the-art methods while returning richer recognition answers at the same time. We evaluate the proposed approach on the challenging natural-scene Graz-02 object classes dataset.

[1]  M. Peterson Object Recognition Processes Can and Do Operate Before Figure–Ground Organization , 1994 .

[2]  Takeo Kanade,et al.  Neural Network-Based Face Detection , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Takeo Kanade,et al.  Rotation Invariant Neural Network-Based Face Detection , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[4]  R. O’Reilly,et al.  Figure-ground organization and object recognition processes: an interactive account. , 1998, Journal of experimental psychology. Human perception and performance.

[5]  Patrick Haffner,et al.  Support vector machines for histogram-based image classification , 1999, IEEE Trans. Neural Networks.

[6]  Paul A. Viola,et al.  Robust Real-time Object Detection , 2001 .

[7]  Refractor Vision , 2000, The Lancet.

[8]  Dan Roth,et al.  Learning a Sparse Representation for Object Detection , 2002, ECCV.

[9]  Shimon Ullman,et al.  Class-Specific, Top-Down Segmentation , 2002, ECCV.

[10]  Bo Zhang,et al.  Support vector machines for region-based image retrieval , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[11]  Cordelia Schmid,et al.  3D object modeling and recognition using affine-invariant patches and multi-view spatial constraints , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[12]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[13]  Cordelia Schmid,et al.  Selection of scale-invariant parts for object class recognition , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[14]  Jianbo Shi,et al.  Object-Specific Figure-Ground Segregation , 2003, CVPR.

[15]  Peter Auer,et al.  Weak Hypotheses and Boosting for Generic Object Detection and Recognition , 2004, ECCV.

[16]  Jitendra Malik,et al.  Spectral grouping using the Nystrom method , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[18]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[19]  Mario Fritz,et al.  On the Significance of Real-World Conditions for Material Classification , 2004, ECCV.

[20]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[21]  Tony Lindeberg,et al.  Feature Detection with Automatic Scale Selection , 1998, International Journal of Computer Vision.

[22]  Dan Roth,et al.  Learning to detect objects in images via a sparse, part-based representation , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[24]  P. Matsakis,et al.  The use of force histograms for affine-invariant relative position description , 2004 .

[25]  Tony Lindeberg,et al.  Direct computation of shape cues using scale-adapted spatial derivative operators , 1996, International Journal of Computer Vision.

[26]  Leonidas J. Guibas,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[27]  Cordelia Schmid,et al.  A maximum entropy framework for part-based texture and object recognition , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[28]  Bernt Schiele,et al.  Pedestrian detection in crowded scenes , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[29]  Trevor Darrell,et al.  The pyramid match kernel: discriminative classification with sets of image features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[30]  Nebojsa Jojic,et al.  LOCUS: learning object classes with unsupervised segmentation , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[31]  Andrew Blake,et al.  Contour-based learning for object detection , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[32]  Cordelia Schmid,et al.  3D Object Modeling and Recognition Using Local Affine-Invariant Image Descriptors and Multi-View Spatial Constraints , 2006, International Journal of Computer Vision.

[33]  Alexei A. Efros,et al.  Discovering objects and their location in images , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[34]  Siwei Lyu,et al.  Mercer kernels for object recognition with local features , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[35]  Bernt Schiele,et al.  Integrating representative and discriminant models for object category detection , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[36]  Axel Pinz,et al.  Object Localization with Boosting and Weak Supervision for Generic Object Recognition , 2005, SCIA.

[37]  Cordelia Schmid,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[38]  A. Atiya,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.

[39]  Luc Van Gool,et al.  Towards Multi-View Object Class Detection , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[40]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[41]  Alexei A. Efros,et al.  Using Multiple Segmentations to Discover Objects and their Extent in Image Collections , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[42]  Pietro Perona,et al.  Weakly Supervised Scale-Invariant Learning of Models for Visual Recognition , 2007, International Journal of Computer Vision.

[43]  Narendra Ahuja,et al.  Extracting Subimages of an Unknown Category from a Set of Images , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[44]  Luc Van Gool,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[45]  Peter Auer,et al.  Generic object recognition with boosting , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Bernt Schiele,et al.  Cross-Articulation Learning for Robust Detection of Pedestrians , 2006, DAGM-Symposium.

[47]  Bernt Schiele,et al.  Multi-Aspect Detection of Articulated Objects , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[48]  Axel Pinz,et al.  Object localization/segmentation using generic shape priors , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[49]  Cordelia Schmid,et al.  Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[50]  Cordelia Schmid,et al.  Spatial Weighting for Bag-of-Features , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[51]  Deva Ramanan,et al.  Using Segmentation to Verify Object Hypotheses , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[52]  Bernt Schiele,et al.  Robust Object Detection with Interleaved Categorization and Segmentation , 2008, International Journal of Computer Vision.

[53]  Cordelia Schmid,et al.  Accurate Object Localization with Shape Masks , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[54]  R. Nevatia,et al.  Simultaneous Object Detection and Segmentation by Boosting Local Shape Feature based Classifier , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[55]  Boris Babenko,et al.  Weakly Supervised Object Localization with Stable Segmentations , 2008, ECCV.

[56]  Roberto Cipolla,et al.  Semantic texton forests for image categorization and segmentation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[57]  Pablo Arbeláez,et al.  Recognition using regions , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[58]  Fei-Fei Li,et al.  Towards total scene understanding: Classification, annotation and segmentation in an automatic framework , 2009, CVPR.

[59]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.