Context-Based Scene Understanding

Context plays an important role in performance of object detection. There are two popular considerations in building context models for computer vision applications; type of context semantic, spatial, scale and scope of the relations pairwise, high-order. In this paper, a new unified framework is presented that combines multiple sources of context in high-order relations to encode semantical coherence and consistency of the scenes. This framework introduces a new descriptor called context relevance score to model context-based distribution of the response variables and apply it to two distributions. First model incorporates context descriptor along with annotation response into a supervised Latent Dirichlet Allocation LDA built on multi-variate Bernoulli distribution called Context-Based LDA CBLDA. The second model is based on multi-variate Wallenius' non-central Hyper-geometric distribution and is called Wallenius LDA WLDA. WLDA incorporates context knowledge as bias parameter. Scene context is modeled as a graph and effectively used in object detection framework to maximize semantical consistency of the scene. The graph can also be used in recognition of out-of-context objects. Annotation metadata of Sun397 dataset is used to construct the context model. Performance of the proposed approaches was evaluated on ImageNet dataset. Comparison between proposed approaches and state-of-art multi-class object annotation algorithm shows superiority of presented approach in labeling of scene content.

[1]  Lior Wolf,et al.  A Critical View of Context , 2006, International Journal of Computer Vision.

[2]  Gert R. G. Lanckriet,et al.  Multi-class object localization by combining local contextual interactions , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[3]  Samy Bengio,et al.  Using Web Co-occurrence Statistics for Improving Image Categorization , 2013, ArXiv.

[4]  Roland Siegwart,et al.  BRISK: Binary Robust invariant scalable keypoints , 2011, 2011 International Conference on Computer Vision.

[5]  Antonio Torralba,et al.  Using the Forest to See the Trees: A Graphical Model Relating Features, Objects, and Scenes , 2003, NIPS.

[6]  Yue Zhao,et al.  Taxonomy augmented object recognition , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[7]  Rui Zhang,et al.  Contextual Object Detection With Spatial Context Prototypes , 2014, IEEE Transactions on Multimedia.

[8]  Antonio Torralba,et al.  Contextual Priming for Object Detection , 2003, International Journal of Computer Vision.

[9]  Amit K. Roy-Chowdhury,et al.  Context-Aware Modeling and Recognition of Activities in Video , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Ling Shao,et al.  Efficient Search and Localization of Human Actions in Video Databases , 2014, IEEE Transactions on Circuits and Systems for Video Technology.

[11]  Jing Xiao,et al.  Detection Evolution with Multi-order Contextual Co-occurrence , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Antonio Torralba,et al.  A Tree-Based Context Model for Object Recognition , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Shiyong Cui,et al.  A Comparative Study of Bag-of-Words and Bag-of-Topics Models of EO Image Patches , 2015, IEEE Geoscience and Remote Sensing Letters.

[14]  Dawei Song,et al.  Pure High-Order Word Dependence Mining via Information Geometry , 2011, ICTIR.

[15]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[16]  Ying Wu,et al.  Action recognition with multiscale spatio-temporal contexts , 2011, CVPR 2011.

[17]  Ping Zhou,et al.  A LDA-Based Approach for Semi-Supervised Document Clustering , 2014 .

[18]  I. Biederman Perceiving Real-World Scenes , 1972, Science.

[19]  Antonio Torralba,et al.  Context models and out-of-context objects , 2012, Pattern Recognit. Lett..

[20]  Liyan Zhang,et al.  Context-based person identification framework for smart video surveillance , 2013, Machine Vision and Applications.

[21]  Farhad Samadzadegan,et al.  Object Recognition Based on the Context Aware Decision-Level Fusion in Multiviews Imagery , 2015, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[22]  Hagai Attias,et al.  Supervised topic model for automatic image annotation , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[23]  Heesoo Myeong,et al.  Tensor-Based High-Order Semantic Relation Transfer for Semantic Scene Segmentation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Stephen Gould,et al.  Multi-Class Segmentation with Relative Location Prior , 2008, International Journal of Computer Vision.

[25]  Charless C. Fowlkes,et al.  Discriminative Models for Multi-Class Object Layout , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[26]  Gary R. Bradski,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[27]  Shiyin Qin,et al.  A new method of image classification based on local appearance and context information , 2013, Neurocomputing.

[28]  Dale Schuurmans,et al.  The latent maximum entropy principle , 2002, Proceedings IEEE International Symposium on Information Theory,.

[29]  Gert R. G. Lanckriet,et al.  Contextual Object Localization With Multiple Kernel Nearest Neighbor , 2011, IEEE Transactions on Image Processing.

[30]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[31]  Chris H. Q. Ding,et al.  K-means clustering via principal component analysis , 2004, ICML.

[32]  Pushmeet Kohli,et al.  Robust Higher Order Potentials for Enforcing Label Consistency , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Ling Shao,et al.  Unsupervised Spectral Dual Assignment Clustering of Human Actions in Context , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Jung-Hyun Lee,et al.  Ontology-based inference system for adaptive object recognition , 2013, Multimedia Tools and Applications.

[35]  J. Chesson A non-central multivariate hypergeometric distribution arising from biased sampling with application to selective predation , 1976 .

[36]  Antonio Criminisi,et al.  TextonBoost for Image Understanding: Multi-Class Object Recognition and Segmentation by Jointly Modeling Texture, Layout, and Context , 2007, International Journal of Computer Vision.

[37]  Agner Fog,et al.  Calculation Methods for Wallenius' Noncentral Hypergeometric Distribution , 2008, Commun. Stat. Simul. Comput..

[38]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[39]  Serge J. Belongie,et al.  Context based object categorization: A critical survey , 2010, Comput. Vis. Image Underst..

[40]  Alexei A. Efros,et al.  Putting Objects in Perspective , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[41]  Ramón Moreno,et al.  A machine learning based intelligent vision system for autonomous object detection and recognition , 2013, Applied Intelligence.