Robust model-based scene interpretation by multilayered context information

In this paper, we present a new graph-based frame work for collaborative place, object, and part recognition in indoor environments. We consider a scene to be an undirected graphical model composed of a place node, object nodes, and part nodes with undirected links. Our key contribution is the introduction of collaborative place and object recognition (we call it as the hierarchical context in this paper) instead of object only or causal relation of place to objects. We unify the hierarchical context and the well-known spatial context into a complete hierarchical graphical model (HGM). In the HGM, object and part nodes contain labels and related pose information instead of only a label for robust inference of objects. The most difficult problems of the HGM are learning and inferring variable graph structures. We learn the HGM in a piecewise manner instead of by joint graph learning for tractability. Since the inference includes variable structure estimation with marginal distribution of each node, we approximate the pseudo-likelihood of marginal distribution using multimodal sequential Monte Carlo with weights updated by belief propagation. Data-driven multimodal hypothesis and context-based pruning provide the correct inference. For successful recognition, issues related to 3D object recognition are also considered and several state-of-the-art methods are incorporated. The proposed system greatly reduces false alarms using the spatial and hierarchical contexts. We demonstrate the feasibility of the HGM-based collaborative place, object, and part recognition in actual large-scale environments for guidance applications (12 places, 112 3D objects).

[1]  Amos J. Storkey Dynamic Trees: A Structured Variational Method Giving Efficient Propagation Rules , 2000, UAI.

[2]  William T. Freeman,et al.  Understanding belief propagation and its generalizations , 2003 .

[3]  In-So Kweon,et al.  Object Recognition Using a Generalized Robust Invariant Feature and Gestalt’s Law of Proximity and Similarity , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[4]  Antonio Torralba,et al.  Context-based vision system for place and object recognition , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[5]  Shimon Ullman,et al.  Identifying semantically equivalent object fragments , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[6]  Antonio Torralba,et al.  Learning hierarchical models of scenes, objects, and parts , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[7]  Antonio Torralba,et al.  Contextual Models for Object Detection Using Boosted Random Fields , 2004, NIPS.

[8]  Pietro Perona,et al.  Recognition by Probabilistic Hypothesis Construction , 2004, ECCV.

[9]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[10]  Patrick Pérez,et al.  Maintaining multimodality through mixture tracking , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[11]  R. Zemel,et al.  Multiscale conditional random fields for image labeling , 2004, CVPR 2004.

[12]  David G. Lowe,et al.  Local feature view clustering for 3D object recognition , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[13]  Sanjiv Kumar Multiclass Discriminative Fields for Parts-Based Object Detection , 2004 .

[14]  Stan Z. Li,et al.  Markov Random Field Modeling in Image Analysis , 2001, Computer Science Workbench.

[15]  Michael C. Nechyba,et al.  Interpretation of Complex Scenes Using Generative Dynamic-Structure Models , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[16]  A. Doucet,et al.  Sequential Monte Carlo methods for multitarget filtering with random finite sets , 2005, IEEE Transactions on Aerospace and Electronic Systems.

[17]  Bernt Schiele,et al.  Natural Scene Retrieval Based on a Semantic Modeling Step , 2004, CIVR.

[18]  In-So Kweon,et al.  Scalable Representation and Learning for 3D Object Recognition Using Shared Feature-Based View Clustering , 2006, ACCV.

[19]  S. Ullman,et al.  Spatial Context in Recognition , 1996, Perception.

[20]  P. Fearnhead,et al.  On‐line inference for hidden Markov models via particle filters , 2003 .

[21]  Antonio Torralba,et al.  Using the Forest to See the Trees: A Graphical Model Relating Features, Objects, and Scenes , 2003, NIPS.

[22]  Pietro Perona,et al.  Mutual Boosting for Contextual Inference , 2003, NIPS.

[23]  Heinrich Niemann,et al.  ERNEST: A Semantic Network System for Pattern Understanding , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  Cordelia Schmid,et al.  A performance evaluation of local descriptors , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Martial Hebert,et al.  A hierarchical field framework for unified context-based classification , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[26]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[27]  Sylvia Richardson,et al.  Markov Chain Monte Carlo in Practice , 1997 .

[28]  Christopher K. I. Williams,et al.  Image Modeling with Position-Encoding Dynamic Trees , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[29]  William T. Freeman,et al.  Learning Low-Level Vision , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[30]  Zhuowen Tu,et al.  Image Parsing: Unifying Segmentation, Detection, and Recognition , 2005, International Journal of Computer Vision.

[31]  Anil K. Jain,et al.  Image classification for content-based indexing , 2001, IEEE Trans. Image Process..

[32]  Jochen Triesch,et al.  Shared Features for Scalable Appearance-Based Object Recognition , 2005, 2005 Seventh IEEE Workshops on Applications of Computer Vision (WACV/MOTION'05) - Volume 1.

[33]  Andrew McCallum,et al.  Piecewise Training for Undirected Models , 2005, UAI.

[34]  In-So Kweon,et al.  Object recognition using a generalized robust invariant feature and Gestalt's law of proximity and similarity , 2008, Pattern Recognit..

[35]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[36]  Nando de Freitas,et al.  A Statistical Model for General Contextual Object Recognition , 2004, ECCV.

[37]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[38]  Pietro Perona,et al.  Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[39]  In-So Kweon,et al.  Simultaneous Classification and VisualWord Selection using Entropy-based Minimum Description Length , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[40]  M. Bar Visual objects in context , 2004, Nature Reviews Neuroscience.

[41]  Tai Sing Lee,et al.  Hierarchical Bayesian inference in the visual cortex. , 2003, Journal of the Optical Society of America. A, Optics, image science, and vision.

[42]  Cordelia Schmid,et al.  Evaluation of Interest Point Detectors , 2000, International Journal of Computer Vision.

[43]  Michael I. Jordan Learning in Graphical Models , 1999, NATO ASI Series.

[44]  Frank Dellaert,et al.  MCMC-based particle filtering for tracking a variable number of interacting targets , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.