Beyond the Sum of Parts: Voting with Groups of Dependent Entities

The high complexity of multi-scale, category-level object detection in cluttered scenes is efficiently handled by Hough voting methods. However, the main shortcoming of the approach is that mutually dependent local observations are independently casting their votes for intrinsically global object properties such as object scale. Object hypotheses are then assumed to be a mere sum of their part votes. Popular representation schemes are, however, based on a dense sampling of semi-local image features, which are consequently mutually dependent. We take advantage of part dependencies and incorporate them into probabilistic Hough voting by deriving an objective function that connects three intimately related problems: i) grouping mutually dependent parts, ii) solving the correspondence problem conjointly for dependent parts, and iii) finding concerted object hypotheses using extended groups rather than based on local observations alone. Early commitments are avoided by not restricting parts to only a single vote for a locally best correspondence and we learn a weighting of parts during training to reflect their differing relevance for an object. Experiments successfully demonstrate the benefit of incorporating part dependencies through grouping into Hough voting. The joint optimization of groupings, correspondences, and votes not only improves the detection accuracy over standard Hough voting and a sliding window baseline, but it also reduces the computational complexity by significantly decreasing the number of candidate hypotheses.

[1]  Jitendra Malik,et al.  Using contours to detect and localize junctions in natural images , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Christoph H. Lampert,et al.  Beyond sliding windows: Object localization by efficient subwindow search , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Jitendra Malik,et al.  Shape matching and object recognition using low distortion correspondences , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[4]  Christopher K. I. Williams,et al.  On a connection between object localization with a generative template of features and pose-space prediction methods , 2006 .

[5]  Longin Jan Latecki,et al.  From partial shape matching through local deformation to robust global shape similarity for object detection , 2011, CVPR 2011.

[6]  Shimon Ullman,et al.  The chains model for detecting parts by their context , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[7]  Gustavo Carneiro,et al.  Sparse Flexible Models of Local Features , 2006, ECCV.

[8]  SchieleBernt,et al.  Robust Object Detection with Interleaved Categorization and Segmentation , 2008 .

[9]  Daniel P. Huttenlocher,et al.  Spatial priors for part-based recognition using statistical models , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[10]  Narendra Ahuja,et al.  Connected Segmentation Tree — A joint representation of region layout and hierarchy , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Ben Taskar,et al.  Object detection via boundary structure segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[12]  Yali Amit,et al.  A Computational Model for Visual Selection , 1999, Neural Computation.

[13]  Andrew Blake,et al.  Contour-based learning for object detection , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[14]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[15]  Eli Shechtman,et al.  In defense of Nearest-Neighbor based image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Cristian Sminchisescu,et al.  Object recognition as ranking holistic figure-ground hypotheses , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[17]  Alexei A. Efros,et al.  Discovering objects and their location in images , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[18]  Sanja Fidler,et al.  Towards Scalable Representations of Object Categories: Learning a Hierarchy of Parts , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Dana H. Ballard,et al.  Generalizing the Hough transform to detect arbitrary shapes , 1981, Pattern Recognit..

[20]  Daniel P. Huttenlocher,et al.  Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[21]  Vincent Lepetit,et al.  Appearance-based keypoint clustering , 2009, CVPR.

[22]  G. Medioni,et al.  Tensor Voting : Theory and Applications , 2000 .

[23]  Koen E. A. van de Sande,et al.  Segmentation as selective search for object recognition , 2011, 2011 International Conference on Computer Vision.

[24]  Andrew Zisserman,et al.  Incremental learning of object detectors using a visual shape alphabet , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[25]  Sven J. Dickinson,et al.  Contour Grouping and Abstraction Using Simple Part Models , 2010, ECCV.

[26]  Jianbo Shi,et al.  Contour Context Selection for Object Detection: A Set-to-Set Contour Matching Approach , 2008, ECCV.

[27]  Jian-Huang Lai,et al.  Learning contour-fragment-based shape model with And-Or tree representation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Björn Ommer,et al.  Voting by Grouping Dependent Parts , 2010, ECCV.

[29]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[30]  Cordelia Schmid,et al.  Bandit Algorithms for Tree Search , 2007, UAI.

[31]  Joseph J. Lim,et al.  Recognition using regions , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Sanja Fidler,et al.  A Coarse-to-Fine Taxonomy of Constellations for Fast Multi-class Object Detection , 2010, ECCV.

[33]  Hayko Riemenschneider,et al.  Using Partial Edge Contour Matches for Efficient Object Category Localization , 2010, ECCV.

[34]  Jake Porway,et al.  A Hierarchical and Contextual Model for Aerial Image Parsing , 2010, International Journal of Computer Vision.

[35]  Bernt Schiele,et al.  Robust Object Detection with Interleaved Categorization and Segmentation , 2008, International Journal of Computer Vision.

[36]  Jitendra Malik,et al.  Multi-scale object detection by clustering lines , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[37]  Luc Van Gool,et al.  PRISM: PRincipled Implicit Shape Model , 2009, BMVC.

[38]  Björn Ommer,et al.  Beyond Bounding-Boxes: Learning Object Shape by Model-Driven Grouping , 2012, ECCV.

[39]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[40]  Subhransu Maji,et al.  Object detection using a max-margin Hough transform , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  Joachim M. Buhmann,et al.  Learning the Compositional Nature of Visual Object Categories for Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[43]  Antonio Torralba,et al.  Learning hierarchical models of scenes, objects, and parts , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[44]  Jianbo Shi,et al.  Many-to-one contour matching for describing and discriminating object shape , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[45]  Jake Porway,et al.  A hierarchical and contextual model for aerial image understanding , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[46]  Juergen Gall,et al.  Class-specific Hough forests for object detection , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[47]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[48]  Pietro Perona,et al.  Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[49]  Vincent Lepetit,et al.  Appearance-based keypoint clustering , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[50]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[51]  Jitendra Malik,et al.  From contours to regions: An empirical evaluation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[52]  Björn Ommer,et al.  From Meaningful Contours to Discriminative Object Shape , 2012, ECCV.