Automatic discovery of meaningful object parts with latent CRFs

Object recognition is challenging due to high intra-class variability caused, e.g., by articulation, viewpoint changes, and partial occlusion. Successful methods need to strike a balance between being flexible enough to model such variation and discriminative enough to detect objects in cluttered, real world scenes. Motivated by these challenges we propose a latent conditional random field (CRF) based on a flexible assembly of parts. By modeling part labels as hidden nodes and developing an EM algorithm for learning from class labels alone, this new approach enables the automatic discovery of semantically meaningful object part representations. To increase the flexibility and expressiveness of the model, we learn the pairwise structure of the underlying graphical model at the level of object part interactions. Efficient gradient-based techniques are used to estimate the structure of the domain of interest and carried forward to the multi-label or object part case. Our experiments illustrate the meaningfulness of the discovered parts and demonstrate state-of-the-art performance of the approach.

[1]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[2]  Yali Amit,et al.  POP: Patchwork of Parts Models for Object Recognition , 2007, International Journal of Computer Vision.

[3]  Daniel P. Huttenlocher,et al.  Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[4]  Takeo Kanade,et al.  Object Detection Using the Statistics of Parts , 2004, International Journal of Computer Vision.

[5]  Bernt Schiele,et al.  Hierarchical Support Vector Random Fields: Joint Training to Combine Local and Global Features , 2008, ECCV.

[6]  Pietro Perona,et al.  Multiple Component Learning for Object Detection , 2008, ECCV.

[7]  Bernt Schiele,et al.  Discriminative structure learning of hierarchical representations for object detection , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Guillaume Bouchard,et al.  Hierarchical part-based visual object categorization , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[9]  Trevor Darrell,et al.  Hidden Conditional Random Fields , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Charless C. Fowlkes,et al.  Discriminative Models for Multi-Class Object Layout , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[11]  Pietro Perona,et al.  Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[12]  Martin A. Fischler,et al.  The Representation and Matching of Pictorial Structures , 1973, IEEE Transactions on Computers.

[13]  Ashish Kapoor,et al.  Located Hidden Random Fields: Learning Discriminative Parts for Object Detection , 2006, ECCV.

[14]  Shimon Ullman,et al.  Semantic Hierarchies for Recognizing Objects and Parts , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Cordelia Schmid,et al.  Combining efficient object localization and image classification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[16]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[17]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[18]  Subhransu Maji,et al.  Classification using intersection kernel support vector machines is efficient , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Michael Goesele,et al.  A shape-based object class model for knowledge transfer , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[20]  Mark W. Schmidt,et al.  Structure learning in random fields for heart motion abnormality detection , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  David A. McAllester,et al.  A discriminatively trained, multiscale, deformable part model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[23]  David A. Forsyth,et al.  Configuration Estimates Improve Pedestrian Finding , 2007, NIPS.

[24]  Antonio Torralba,et al.  Describing Visual Scenes Using Transformed Objects and Parts , 2008, International Journal of Computer Vision.

[25]  Bernt Schiele,et al.  Pedestrian detection in crowded scenes , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[26]  Trevor Darrell,et al.  The Pyramid Match Kernel: Efficient Learning with Sets of Features , 2007, J. Mach. Learn. Res..

[27]  Daniel P. Huttenlocher,et al.  Weakly Supervised Learning of Part-Based Spatial Models for Visual Object Recognition , 2006, ECCV.

[28]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[29]  Andrew Zisserman,et al.  Multiple kernels for object detection , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[30]  Larry S. Davis,et al.  Multiple instance fFeature for robust part-based object detection , 2009, CVPR.

[31]  Andrew Zisserman,et al.  Efficient discriminative learning of parts-based models , 2009, 2009 IEEE 12th International Conference on Computer Vision.