A hierarchical inferential method for indoor scene classification

Abstract Indoor scene classification forms a basis for scene interaction for service robots. The task is challenging because the layout and decoration of a scene vary considerably. Previous studies on knowledge-based methods commonly ignore the importance of visual attributes when constructing the knowledge base. These shortcomings restrict the performance of classification. The structure of a semantic hierarchy was proposed to describe similarities of different parts of scenes in a fine-grained way. Besides the commonly used semantic features, visual attributes were also introduced to construct the knowledge base. Inspired by the processes of human cognition and the characteristics of indoor scenes, we proposed an inferential framework based on the Markov logic network. The framework is evaluated on a popular indoor scene dataset, and the experimental results demonstrate its effectiveness.

[1]  Cordelia Schmid,et al.  Semantic Hierarchies for Visual Object Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[3]  Sanja Fidler,et al.  Analyzing Semantic Segmentation Using Hybrid Human-Machine CRFs , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Yicong Zhou,et al.  Kernel Regularized Data Uncertainty for Action Recognition , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[5]  David A. McAllester,et al.  The Generalized A* Architecture , 2007, J. Artif. Intell. Res..

[6]  Juan Manuel Górriz,et al.  Functional brain image classification using association rules defined over discriminant regions , 2012, Pattern Recognit. Lett..

[7]  Cewu Lu,et al.  Learning Important Spatial Pooling Regions for Scene Classification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Jorge Dias,et al.  Knowledge-based reasoning from human grasp demonstrations for robot grasp synthesis , 2014, Robotics Auton. Syst..

[9]  Jennifer Neville,et al.  Relational Dependency Networks , 2007, J. Mach. Learn. Res..

[10]  Hong Qiao,et al.  Improving invariance in visual classification with biologically inspired mechanism , 2014, Neurocomputing.

[11]  Qi Tian,et al.  Orientational Pyramid Matching for Recognizing Indoor Scenes , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Pierre Kornprobst,et al.  Action recognition via bio-inspired features: The richness of center-surround interaction , 2012, Comput. Vis. Image Underst..

[13]  Antonio Torralba,et al.  HOGgles: Visualizing Object Detection Features , 2013, 2013 IEEE International Conference on Computer Vision.

[14]  Ali Farhadi,et al.  Describing objects by their attributes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Yicong Zhou,et al.  Pairwise Linear Regression Classification for Image Set Retrieval , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Maria Petrou,et al.  Learning Logic Rules for Scene Interpretation Based on Markov Logic Networks , 2009, ACCV.

[17]  Céline Hudelot,et al.  Building Semantic Hierarchies Faithful to Image Semantics , 2012, MMM.

[18]  Sabine Süsstrunk,et al.  Linear demosaicing inspired by the human visual system , 2005, IEEE Transactions on Image Processing.

[19]  Yiannis Aloimonos,et al.  A Gestaltist approach to contour-based object recognition: Combining bottom-up and top-down cues , 2015, Int. J. Robotics Res..

[20]  Tsuhan Chen,et al.  Pictorial structures for object recognition and part labeling in drawings , 2011, 2011 18th IEEE International Conference on Image Processing.

[21]  Qi Tian,et al.  Beyond visual features: A weak semantic image representation using exemplar classifiers for classification , 2013, Neurocomputing.

[22]  Matthew Richardson,et al.  Markov logic networks , 2006, Machine Learning.

[23]  Silvio Savarese,et al.  Video scene categorization by 3D hierarchical histogram matching , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[24]  Yuan Yan Tang,et al.  High-Order Distance-Based Multiview Stochastic Learning in Image Classification , 2014, IEEE Transactions on Cybernetics.

[25]  Agma J. M. Traina,et al.  Supporting content-based image retrieval and computer-aided diagnosis systems with association rule-based techniques , 2009, Data Knowl. Eng..

[26]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[27]  Fuchun Sun,et al.  HyperNet: Towards Accurate Region Proposal Generation and Joint Object Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Qi Tian,et al.  Ieee Transactions on Image Processing Spatial Pooling of Heterogeneous Features for Image Classification , 2022 .

[30]  Pedro M. Domingos,et al.  Entity Resolution with Markov Logic , 2006, Sixth International Conference on Data Mining (ICDM'06).

[31]  Jianhua Lu,et al.  Learning Logic Rules for the Tower of Knowledge Using Markov Logic Networks , 2011, Int. J. Pattern Recognit. Artif. Intell..

[32]  Hao Su,et al.  Object Bank: An Object-Level Image Representation for High-Level Visual Recognition , 2014, International Journal of Computer Vision.

[33]  Jun Yu,et al.  High-level attributes modeling for indoor scenes classification , 2013, Neurocomputing.

[34]  Xuelong Li,et al.  Biologically Inspired Features for Scene Classification in Video Surveillance , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[35]  Ziyuan Liu,et al.  Applying rule-based context knowledge to build abstract semantic maps of indoor environments , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[36]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[37]  Antonio Torralba,et al.  Recognizing indoor scenes , 2009, CVPR.

[38]  Jun Yu,et al.  Pairwise constraints based multiview features fusion for scene classification , 2013, Pattern Recognit..

[39]  Jeng-Shyang Pan,et al.  Superimposed Sparse Parameter Classifiers for Face Recognition , 2017, IEEE Transactions on Cybernetics.

[40]  Fei-Fei Li,et al.  Building and using a semantivisual image hierarchy , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[41]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[42]  Meng Wang,et al.  Adaptive Hypergraph Learning and its Application in Image Classification , 2012, IEEE Transactions on Image Processing.

[43]  David A. McAllester,et al.  Object Detection with Grammar Models , 2011, NIPS.

[44]  Laurent Itti,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 Rapid Biologically-inspired Scene Classification Using Features Shared with Visual Attention , 2022 .

[45]  Vincent Lepetit,et al.  Learning Separable Filters , 2013, CVPR.

[46]  Jean-Daniel Zucker,et al.  Abstraction in Artificial Intelligence and Complex Systems , 2013, Springer New York.

[47]  David A. McAllester,et al.  Cascade object detection with deformable part models , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[48]  Hao Su,et al.  Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification , 2010, NIPS.

[49]  Andrew Blake,et al.  Contour-based learning for object detection , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[50]  Céline Hudelot,et al.  Hierarchical image annotation using semantic hierarchies , 2012, CIKM.

[51]  Dewen Hu,et al.  Scene classification using a multi-resolution bag-of-features model , 2013, Pattern Recognit..

[52]  Chengjun Liu,et al.  New image descriptors based on color, texture, shape, and wavelets for object and scene image classification , 2013, Neurocomputing.

[53]  Kavita Bala,et al.  Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Li Fei-Fei,et al.  Reasoning about Object Affordances in a Knowledge Base Representation , 2014, ECCV.

[55]  Jake Porway,et al.  A Hierarchical and Contextual Model for Aerial Image Parsing , 2010, International Journal of Computer Vision.

[56]  Bernt Schiele,et al.  What Makes for Effective Detection Proposals? , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[57]  Meng Wang,et al.  Semisupervised Multiview Distance Metric Learning for Cartoon Synthesis , 2012, IEEE Transactions on Image Processing.

[58]  Gustavo Carneiro,et al.  Supervised Learning of Semantic Classes for Image Annotation and Retrieval , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[59]  Thomas Martin Deserno,et al.  Bridging the integration gap between imaging and information systems: a uniform data concept for content-based image retrieval in computer-aided diagnosis , 2011, J. Am. Medical Informatics Assoc..

[60]  Deva Ramanan,et al.  Histograms of Sparse Codes for Object Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[61]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[62]  Alexei A. Efros,et al.  Automatic photo pop-up , 2005, SIGGRAPH 2005.

[63]  Larry S. Davis,et al.  Why Did the Person Cross the Road (There)? Scene Understanding Using Probabilistic Logic Models and Common Sense Reasoning , 2010, ECCV.

[64]  Vincent Lepetit,et al.  Are sparse representations really relevant for image classification? , 2011, CVPR 2011.

[65]  Wanqing Li,et al.  A novel shape-based non-redundant local binary pattern descriptor for object detection , 2013, Pattern Recognit..

[66]  David A. McAllester,et al.  A discriminatively trained, multiscale, deformable part model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[67]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[68]  Zhipeng Ye,et al.  Cognition inspired framework for indoor scene annotation , 2015, J. Electronic Imaging.

[69]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[70]  Tat-Seng Chua,et al.  Semantic-Gap-Oriented Active Learning for Multilabel Image Annotation , 2012, IEEE Transactions on Image Processing.

[71]  Ricardo da Silva Torres,et al.  Visual word spatial arrangement for image retrieval and classification , 2014, Pattern Recognit..

[72]  Mohammed Bennamoun,et al.  Geometry Driven Semantic Labeling of Indoor Scenes , 2014, ECCV.

[73]  Léon Bottou,et al.  From machine learning to machine reasoning , 2011, Machine Learning.

[74]  Fei-Fei Li,et al.  Hierarchical semantic indexing for large scale image retrieval , 2011, CVPR 2011.

[75]  Nuno Vasconcelos,et al.  Scene classification with semantic Fisher vectors , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[76]  Christophe De Vleeschouwer,et al.  Human visual system features enabling watermarking , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.