Semantic indoor scenes recognition based on visual saliency and part-based features

This paper presents a semantic indoor scene recognition method used for an autonomous mobile robot. The proposed method comprises feature description using accelerated KAZE (AKAZE), saliency maps (SMs) for feature selection, creating bags of visual words (BoVWs) using self-organizing maps (SOMs), and incorporating scene recognition based on category maps using counter propagation networks (CPNs). Saliency-based features are used in semantic indoor scene recognition. This study was conducted to evaluate the combination of salient features. We conducted evaluation experiments using a public benchmark dataset for comparison of feature sets of three types. We demonstrated basic properties of feature combination using part-based key-point feature descriptors according to saliency local regions consisted of generic objects.

[1]  Hirokazu Madokoro,et al.  Adaptive Category Mapping Networks for all-mode topological feature learning used for mobile robot vision , 2014, The 23rd IEEE International Symposium on Robot and Human Interactive Communication.

[2]  R. Hecht-Nielsen Counterpropagation networks. , 1987, Applied optics.

[3]  Adrien Bartoli,et al.  Fast Explicit Diffusion for Accelerated Features in Nonlinear Scale Spaces , 2013, BMVC.

[4]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[5]  Kurt Konolige,et al.  Real-time Localization in Outdoor Environments using Stereo Vision and Inexpensive GPS , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[6]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[7]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[8]  Ali Shokoufandeh,et al.  View-based object recognition using saliency maps , 1999, Image Vis. Comput..

[9]  S. Mills,et al.  Speeded-up Bag-of-Words algorithm for robot localisation through scene recognition , 2008, 2008 23rd International Conference Image and Vision Computing New Zealand.

[10]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[11]  Bolei Zhou,et al.  Places: A 10 Million Image Database for Scene Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Ben H. H. Juurlink,et al.  FPGA based hardware accelerator for KAZE feature extraction algorithm , 2016, 2016 International Conference on Field-Programmable Technology (FPT).

[13]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[14]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[15]  Ali Borji,et al.  State-of-the-Art in Visual Attention Modeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  P. Sterling,et al.  How Much the Eye Tells the Brain , 2006, Current Biology.

[17]  A. Treisman,et al.  A feature-integration theory of attention , 1980, Cognitive Psychology.

[18]  Teuvo Kohonen,et al.  Self-organized formation of topologically correct feature maps , 2004, Biological Cybernetics.

[19]  Christof Koch,et al.  Modeling attention to salient proto-objects , 2006, Neural Networks.

[20]  Nanning Zheng,et al.  Learning to Detect a Salient Object , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Sanja Fidler,et al.  Describing the scene as a whole: Joint object detection, scene classification and semantic segmentation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[23]  Antonio Torralba,et al.  Recognizing indoor scenes , 2009, CVPR.

[24]  Adrien Bartoli,et al.  KAZE Features , 2012, ECCV.

[25]  Barbara Caputo,et al.  Indoor Scene Recognition using Task and Saliency-driven Feature Pooling , 2012, BMVC.

[26]  S Ullman,et al.  Shifts in selective visual attention: towards the underlying neural circuitry. , 1985, Human neurobiology.