Building an Enhanced Vocabulary of the Robot Environment with a Ceiling Pointing Camera

Mobile robots are of great help for automatic monitoring tasks in different environments. One of the first tasks that needs to be addressed when creating these kinds of robotic systems is modeling the robot environment. This work proposes a pipeline to build an enhanced visual model of a robot environment indoors. Vision based recognition approaches frequently use quantized feature spaces, commonly known as Bag of Words (BoW) or vocabulary representations. A drawback using standard BoW approaches is that semantic information is not considered as a criteria to create the visual words. To solve this challenging task, this paper studies how to leverage the standard vocabulary construction process to obtain a more meaningful visual vocabulary of the robot work environment using image sequences. We take advantage of spatio-temporal constraints and prior knowledge about the position of the camera. The key contribution of our work is the definition of a new pipeline to create a model of the environment. This pipeline incorporates (1) tracking information to the process of vocabulary construction and (2) geometric cues to the appearance descriptors. Motivated by long term robotic applications, such as the aforementioned monitoring tasks, we focus on a configuration where the robot camera points to the ceiling, which captures more stable regions of the environment. The experimental validation shows how our vocabulary models the environment in more detail than standard vocabulary approaches, without loss of recognition performance. We show different robotic tasks that could benefit of the use of our visual vocabulary approach, such as place recognition or object discovery. For this validation, we use our publicly available data-set.

[1]  Elke Achtert,et al.  Interactive data mining with 3D-parallel-coordinate-trees , 2013, SIGMOD '13.

[2]  Alejandro Rituerto,et al.  Semantic labeling for indoor topological mapping using a wearable catadioptric system , 2014, Robotics Auton. Syst..

[3]  Eli Shechtman,et al.  In defense of Nearest-Neighbor based image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[5]  Alexei A. Efros,et al.  Unsupervised Discovery of Mid-Level Discriminative Patches , 2012, ECCV.

[6]  Emilio Garcia-Fidalgo,et al.  Vision-based topological mapping and localization methods: A survey , 2015, Robotics Auton. Syst..

[7]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[8]  Alexei A. Efros,et al.  Using Multiple Segmentations to Discover Objects and their Extent in Image Collections , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[9]  Jun Miura,et al.  Mobile monitoring of physical states of indoor environments for personal support , 2015, 2015 IEEE/SICE International Symposium on System Integration (SII).

[10]  Christoph H. Lampert,et al.  Unsupervised Object Discovery: A Comparison , 2010, International Journal of Computer Vision.

[11]  Mubarak Shah,et al.  Learning semantic visual vocabularies using diffusion distance , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Noboru Ohnishi,et al.  Informative patches sampling for image classification by utilizing bottom-up and top-down information , 2012, Machine Vision and Applications.

[13]  Cordelia Schmid,et al.  Improving Bag-of-Features for Large Scale Image Search , 2010, International Journal of Computer Vision.

[14]  Ioannis Pratikakis,et al.  Bag of spatio-visual words for context inference in scene classification , 2013, Pattern Recognit..

[15]  Urbano Nunes,et al.  Real-time Application for Monitoring Human Daily Activity and Risk Situations in Robot-Assisted Living , 2015, ROBOT.

[16]  Kurt Konolige,et al.  Towards lifelong visual maps , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[17]  Jan-Michael Frahm,et al.  From structure-from-motion point clouds to fast location recognition , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Michael Isard,et al.  Lost in quantization: Improving particular object retrieval in large scale image databases , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Shawn D. Newsam,et al.  Spatial pyramid co-occurrence for image classification , 2011, 2011 International Conference on Computer Vision.

[21]  Andrew Zisserman,et al.  Visual Vocabulary with a Semantic Twist , 2014, ACCV.

[22]  Hobart R. Everett,et al.  Real-world issues in warehouse navigation , 1995, Other Conferences.

[23]  Alexander C. Berg,et al.  Finding iconic images , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[24]  Antonios Gasteratos,et al.  Learning spatially semantic representations for cognitive robot navigation , 2013, Robotics Auton. Syst..

[25]  O. Chum,et al.  Geometric min-Hashing: Finding a (thick) needle in a haystack , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Bernardo Wagner,et al.  Robust Self-Localization in Industrial Environments based on 3D Ceiling Structures , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[27]  Pietro Perona,et al.  Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[28]  Jim Jing-Yan Wang,et al.  Joint learning and weighting of visual vocabulary for bag-of-feature based tissue classification , 2013, Pattern Recognit..

[29]  Carol C. Menassa,et al.  Real-time building energy and comfort parameter data collection using mobile indoor robots , 2015 .

[30]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[31]  Robert R. Sokal,et al.  A statistical method for evaluating systematic relationships , 1958 .

[32]  Wolfram Burgard,et al.  G2o: A general framework for graph optimization , 2011, 2011 IEEE International Conference on Robotics and Automation.

[33]  Achim J. Lilienthal,et al.  SIFT, SURF & seasons: Appearance-based long-term localization in outdoor environments , 2010, Robotics Auton. Syst..

[34]  Jae-Bok Song,et al.  Monocular Vision-Based SLAM in Indoor Environment Using Corner, Lamp, and Door Features From Upward-Looking Camera , 2011, IEEE Transactions on Industrial Electronics.

[35]  H. S. Wolff,et al.  iRun: Horizontal and Vertical Shape of a Region-Based Graph Compression , 2022, Sensors.

[36]  Changhu Wang,et al.  Spatial-bag-of-features , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[37]  Pinar Duygulu Sahin,et al.  Multimedia translation for linking visual data to semantics in videos , 2009, Machine Vision and Applications.

[38]  Yann Chevaleyre,et al.  A meta-learning approach to ground symbols from visual percepts , 2003, Robotics Auton. Syst..

[39]  Wen Gao,et al.  Towards semantic embedding in visual vocabulary , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[40]  Fumihito Arai,et al.  Navigation system based on ceiling landmark recognition for autonomous mobile robot -position/orientation control by landmark recognition with plus and minus primitives , 1996, Proceedings of IEEE International Conference on Robotics and Automation.

[41]  Ricardo da Silva Torres,et al.  Visual word spatial arrangement for image retrieval and classification , 2014, Pattern Recognit..

[42]  De Xu,et al.  Ceiling-Based Visual Positioning for an Indoor Mobile Robot With Monocular Vision , 2009, IEEE Transactions on Industrial Electronics.

[43]  Alberto Jardón Huete,et al.  Object Detection Techniques Applied on Mobile Robot Semantic Navigation , 2014, Sensors.

[44]  Rong Jin,et al.  Unifying discriminative visual codebook generation with classifier training for object category recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[45]  Ramakant Nevatia,et al.  Evaluating multimedia features and fusion for example-based event detection , 2013, Machine Vision and Applications.

[46]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[47]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[48]  Alexei A. Efros,et al.  What makes Paris look like Paris? , 2015, Commun. ACM.

[49]  Marc Sebban,et al.  Supervised learning of Gaussian mixture models for visual vocabulary generation , 2012, Pattern Recognit..

[50]  Jiri Matas,et al.  Learning Vocabularies over a Fine Quantization , 2013, International Journal of Computer Vision.

[51]  Paul Newman,et al.  Appearance-only SLAM at large scale with FAB-MAP 2.0 , 2011, Int. J. Robotics Res..

[52]  Wolfram Burgard,et al.  Experiences with an Interactive Museum Tour-Guide Robot , 1999, Artif. Intell..