Robust Object Detection with Interleaved Categorization and Segmentation

Abstract This paper presents a novel method for detecting and localizing objects of a visual category in cluttered real-world scenes. Our approach considers object categorization and figure-ground segmentation as two interleaved processes that closely collaborate towards a common goal. As shown in our work, the tight coupling between those two processes allows them to benefit from each other and improve the combined performance. The core part of our approach is a highly flexible learned representation for object shape that can combine the information observed on different training examples in a probabilistic extension of the Generalized Hough Transform. The resulting approach can detect categorical objects in novel images and automatically infer a probabilistic segmentation from the recognition result. This segmentation is then in turn used to again improve recognition by allowing the system to focus its efforts on object pixels and to discard misleading influences from the background. Moreover, the information from where in the image a hypothesis draws its support is employed in an MDL based hypothesis verification stage to resolve ambiguities between overlapping hypotheses and factor out the effects of partial occlusion. An extensive evaluation on several large data sets shows that the proposed system is applicable to a range of different object categories, including both rigid and articulated objects. In addition, its flexible representation allows it to achieve competitive object detection performance already from training sets that are between one and two orders of magnitude smaller than those used in comparable systems.

[1]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[2]  R. F. Brown,et al.  PERFORMANCE EVALUATION , 2019, ISO 22301:2019 and business continuity management – Understand how to plan, implement and enhance a business continuity management system (BCMS).

[3]  Michel Bruynooghe,et al.  Méthodes nouvelles en classification automatique de données taxinomiques nombreuses , 1977 .

[4]  C. de Rham,et al.  La classification hiérarchique ascendante selon la méthode des voisins réciproques , 1980 .

[5]  Dana H. Ballard,et al.  Generalizing the Hough transform to detect arbitrary shapes , 1981, Pattern Recognit..

[6]  H. Edelsbrunner,et al.  Efficient algorithms for agglomerative hierarchical clustering methods , 1984 .

[7]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[8]  Ruzena Bajcsy,et al.  Segmentation versus object representation—are they separable? , 1989 .

[9]  M. Peterson Object Recognition Processes Can and Do Operate Before Figure–Ground Organization , 1994 .

[10]  Yizong Cheng,et al.  Mean Shift, Mode Seeking, and Clustering , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Cordelia Schmid,et al.  Combining greyvalue invariants with local constraints for object recognition , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[12]  Michael J. Jones,et al.  Model-Based Matching by Linear Combinations of Prototypes , 1996 .

[13]  J.-P. Benzécri,et al.  Rappel : Construction d'une classification ascendante hiérarchique par la recherche en chaîne des voisins réciproques , 1997 .

[14]  Norbert Krüger,et al.  Face recognition by elastic bunch graph matching , 1997, Proceedings of International Conference on Image Processing.

[15]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[16]  Stan Sclaroff,et al.  Deformable prototypes for encoding shape categories in image databases , 1995, Pattern Recognit..

[17]  Pietro Perona,et al.  A Probabilistic Approach to Object Recognition Using Local Photometry and Global Geometry , 1998, ECCV.

[18]  S. Ullman Three-dimensional object recognition based on the combination of views , 1998, Cognition.

[19]  Takeo Kanade,et al.  Neural Network-Based Face Detection , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  Timothy F. Cootes,et al.  Active Appearance Models , 1998, ECCV.

[21]  R. O’Reilly,et al.  Figure-ground organization and object recognition processes: an interactive account. , 1998, Journal of experimental psychology. Human perception and performance.

[22]  W. Eric L. Grimson,et al.  Adaptive background mixture models for real-time tracking , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[23]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[24]  Pietro Perona,et al.  Unsupervised learning of models for object recognition , 2000 .

[25]  Ronen Basri,et al.  Fast multiscale image segmentation , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[26]  Pietro Perona,et al.  Towards automatic discovery of object categories , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[27]  Thomas Serre,et al.  Component-based face detection , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[28]  Tomaso A. Poggio,et al.  Example-Based Object Detection in Images by Components , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[29]  Dorin Comaniciu,et al.  The Variable Bandwidth Mean Shift and Data-Driven Scale Selection , 2001, ICCV.

[30]  Refractor Vision , 2000, The Lancet.

[31]  A. Needham Object recognition and object segregation in 4.5-month-old infants. , 2001, Journal of experimental child psychology.

[32]  Shimon Ullman,et al.  Class-Specific, Top-Down Segmentation , 2002, ECCV.

[33]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[34]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[35]  Cordelia Schmid,et al.  Learning to Parse Pictures of People , 2002, ECCV.

[36]  Thomas S. Huang,et al.  Fusion of global and local information for object detection , 2002, Object recognition supported by user interaction for service robots.

[37]  Michel Vidal-Naquet,et al.  Visual features of intermediate complexity and their use in classification , 2002, Nature Neuroscience.

[38]  Derek R. Magee,et al.  Detecting lameness using 'Re-sampling Condensation' and 'multi-stream cyclic hidden Markov models' , 2002, Image Vis. Comput..

[39]  B. Schiele,et al.  Interleaved Object Categorization and Segmentation , 2003, BMVC.

[40]  Pietro Perona,et al.  Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[41]  Cordelia Schmid,et al.  Selection of scale-invariant parts for object class recognition , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[42]  Pietro Perona,et al.  A Bayesian approach to unsupervised one-shot learning of object categories , 2003, ICCV 2003.

[43]  Zhuowen Tu,et al.  Image Parsing: Unifying Segmentation, Detection, and Recognition , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[44]  Pietro Perona,et al.  A Bayesian approach to unsupervised one-shot learning of object categories , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[45]  Robert T. Collins,et al.  Mean-shift blob tracking through scale space , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[46]  Jianbo Shi,et al.  Object-Specific Figure-Ground Segregation , 2003, CVPR.

[47]  Peter Auer,et al.  Weak Hypotheses and Boosting for Generic Object Detection and Recognition , 2004, ECCV.

[48]  Michael Jones,et al.  Multidimensional Morphable Models: A Framework for Representing and Matching Object Classes , 2004, International Journal of Computer Vision.

[49]  Shimon Ullman,et al.  Combining Top-Down and Bottom-Up Segmentation , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[50]  Michael Brady,et al.  Saliency, Scale and Image Description , 2001, International Journal of Computer Vision.

[51]  T. Tuytelaars,et al.  Matching Widely Separated Views Based on Affine Invariant Regions , 2004, International Journal of Computer Vision.

[52]  Jitendra Malik,et al.  Contour and Texture Analysis for Image Segmentation , 2001, International Journal of Computer Vision.

[53]  Takeo Kanade,et al.  Object Detection Using the Statistics of Parts , 2004, International Journal of Computer Vision.

[54]  B. Schiele,et al.  Combined Object Categorization and Segmentation With an Implicit Shape Model , 2004 .

[55]  A. Torralba,et al.  Sharing features: efficient boosting procedures for multiclass object detection , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[56]  Bernt Schiele,et al.  Scale-Invariant Object Categorization Using a Scale-Adaptive Mean-Shift Search , 2004, DAGM-Symposium.

[57]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[58]  Antonio Torralba,et al.  Sharing features: efficient boosting procedures for multiclass object detection , 2004, CVPR 2004.

[59]  Christophe Garcia,et al.  Convolutional face finder: a neural architecture for fast and robust face detection , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[60]  Stefan Carlsson,et al.  Appearance Based Qualitative Image Description for Object Class Recognition , 2004, ECCV.

[61]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[62]  Tony Lindeberg,et al.  Feature Detection with Automatic Scale Selection , 1998, International Journal of Computer Vision.

[63]  Dan Roth,et al.  Learning to detect objects in images via a sparse, part-based representation , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[64]  Daniel P. Huttenlocher,et al.  Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[65]  Alan L. Yuille,et al.  Feature extraction from faces using deformable templates , 2004, International Journal of Computer Vision.

[66]  Tomaso A. Poggio,et al.  A Trainable System for Object Detection , 2000, International Journal of Computer Vision.

[67]  Luc Van Gool,et al.  Simultaneous Object Recognition and Segmentation by Image Exploration , 2004, ECCV.

[68]  Cordelia Schmid,et al.  Human Detection Based on a Probabilistic Assembly of Robust Part Detectors , 2004, ECCV.

[69]  Pietro Perona,et al.  A sparse object category model for efficient learning and exhaustive recognition , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[70]  Bernt Schiele,et al.  Pedestrian detection in crowded scenes , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[71]  Zhuowen Tu,et al.  Image Parsing: Unifying Segmentation, Detection, and Recognition , 2005, International Journal of Computer Vision.

[72]  Hermann Ney,et al.  Improving a Discriminative Approach to Object Recognition Using Image Patches , 2005, DAGM-Symposium.

[73]  Cordelia Schmid,et al.  A Comparison of Affine Region Detectors , 2005, International Journal of Computer Vision.

[74]  Ruzena Bajcsy,et al.  Segmentation of range images as the search for geometric parametric models , 1995, International Journal of Computer Vision.

[75]  Cordelia Schmid,et al.  A performance evaluation of local descriptors , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[76]  Bernt Schiele,et al.  An Evaluation of Local Shape-Based Features for Pedestrian Detection , 2005, BMVC.

[77]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[78]  Wei Zhang,et al.  Object class recognition using multiple layer boosting with heterogeneous features , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[79]  Ramakant Nevatia,et al.  Detection of multiple, partially occluded humans in a single image by Bayesian combination of edgelet part detectors , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[80]  Cordelia Schmid,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[81]  Bernt Schiele,et al.  Local features for object class recognition , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[82]  Ido Dagan,et al.  Machine Learning Challenges: Evaluating Predictive Uncertainty, Visual Object Classification, and Recognizing Textual EntailmentFirst Pascal Machine Learning ... / Lecture Notes in Artificial Intelligence) , 2006 .

[83]  Cordelia Schmid,et al.  Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[84]  J Quinonero Candela,et al.  Machine Learning Challenges. Evaluating Predictive Uncertainty, Visual Object Classification, and Recognising Tectual Entailment , 2006, Lecture Notes in Computer Science.

[85]  David G. Lowe,et al.  Multiclass Object Recognition with Sparse, Localized Features , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).