Lost Shopping! Monocular Localization in Large Indoor Spaces

In this paper we propose a novel approach to localization in very large indoor spaces (i.e., 200+ store shopping malls) that takes a single image and a floor plan of the environment as input. We formulate the localization problem as inference in a Markov random field, which jointly reasons about text detection (localizing shop's names in the image with precise bounding boxes), shop facade segmentation, as well as camera's rotation and translation within the entire shopping mall. The power of our approach is that it does not use any prior information about appearance and instead exploits text detections corresponding to the shop names. This makes our method applicable to a variety of domains and robust to store appearance variation across countries, seasons, and illumination conditions. We demonstrate the performance of our approach in a new dataset we collected of two very large shopping malls, and show the power of holistic reasoning.

[1]  Marc Pollefeys,et al.  Efficient structured prediction for 3D indoor scene understanding , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Richard Szeliski,et al.  City-Scale Location Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Alexei A. Efros,et al.  IM2GPS: estimating geographic information from a single image , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Daniel Cremers,et al.  LSD-SLAM: Large-Scale Direct Monocular SLAM , 2014, ECCV.

[5]  Robert Harle,et al.  Pedestrian localisation for indoor environments , 2008, UbiComp.

[6]  Daniel P. Huttenlocher,et al.  Location Recognition Using Prioritized Feature Matching , 2010, ECCV.

[7]  Eric Royer,et al.  Outdoor/Indoor Vision-Based Localization for Blind Pedestrian Navigation Assistance , 2010, Int. J. Image Graph..

[8]  Bastian Leibe,et al.  OpenStreetSLAM: Global vehicle localization using OpenStreetMaps , 2013, 2013 IEEE International Conference on Robotics and Automation.

[9]  David A. Forsyth,et al.  Thinking Inside the Box: Using Appearance Models and Context Based on Room Geometry , 2010, ECCV.

[10]  Kristen Grauman,et al.  Story-Driven Summarization for Egocentric Video , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Liviu Iftode,et al.  Indoor Localization Using Camera Phones , 2006, Seventh IEEE Workshop on Mobile Computing Systems & Applications (WMCSA'06 Supplement).

[12]  Andreas Geiger,et al.  Lost! Leveraging the Crowd for Probabilistic Visual Self-Localization , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Mayank Bansal,et al.  Geometric Urban Geo-localization , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Torsten Sattler,et al.  Fast image-based localization using direct 2D-to-3D matching , 2011, 2011 International Conference on Computer Vision.

[15]  Andrew Zisserman,et al.  Deep Structured Output Learning for Unconstrained Text Recognition , 2014, ICLR.

[16]  Davide Scaramuzza,et al.  SVO: Fast semi-direct monocular visual odometry , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[17]  Avideh Zakhor,et al.  Image Based Localization in Indoor Environments , 2013, 2013 Fourth International Conference on Computing for Geospatial Research and Application.

[18]  C. Lawrence Zitnick,et al.  Structured Forests for Fast Edge Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[19]  Derek Hoiem,et al.  Recovering the spatial layout of cluttered rooms , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[20]  Mubarak Shah,et al.  Accurate Image Localization Based on Google Maps Street View , 2010, ECCV.

[21]  Noah Snavely,et al.  Graph-Based Discriminative Learning for Location Recognition , 2013, International Journal of Computer Vision.

[22]  James M. Rehg,et al.  Learning to Recognize Daily Actions Using Gaze , 2012, ECCV.

[23]  Irfan A. Essa,et al.  Egocentric Field-of-View Localization Using First-Person Point-of-View Devices , 2015, 2015 IEEE Winter Conference on Applications of Computer Vision.

[24]  Richard Szeliski,et al.  Reconstructing building interiors from images , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[25]  David J. Crandall,et al.  PlaceAvoider: Steering First-Person Cameras away from Sensitive Spaces , 2014, NDSS.

[26]  Gregory Shakhnarovich,et al.  Diverse M-Best Solutions in Markov Random Fields , 2012, ECCV.

[27]  Takeo Kanade,et al.  Geometric reasoning for single image structure recovery , 2009, CVPR.

[28]  Jianxiong Xiao,et al.  Reconstructing the World's Museums , 2012, ECCV.

[29]  Sanja Fidler,et al.  Rent3D: Floor-plan priors for monocular layout estimation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Luc Van Gool,et al.  Learning Domain Knowledge for Façade Labelling , 2012, ECCV.

[31]  Masatoshi Okutomi,et al.  Visual Place Recognition with Repetitive Structures , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Richard Szeliski,et al.  Modeling the World from Internet Photo Collections , 2008, International Journal of Computer Vision.

[33]  Antonio Torralba,et al.  Recognizing indoor scenes , 2009, CVPR.

[34]  Steven M. Seitz,et al.  The 3D Jigsaw Puzzle: Mapping Large Indoor Spaces , 2014, ECCV.

[35]  George J. Pappas,et al.  Semantic Localization Via the Matrix Permanent , 2014, Robotics: Science and Systems.

[36]  Sanja Fidler,et al.  Box in the Box: Joint 3D Layout and Object Reasoning from Single Images , 2013, 2013 IEEE International Conference on Computer Vision.

[37]  Richard Szeliski,et al.  Building Rome in a day , 2009, ICCV.

[38]  Majid Ahmadi,et al.  Robust indoor positioning using differential wi-fi access points , 2010, IEEE Transactions on Consumer Electronics.

[39]  Sanja Fidler,et al.  Holistic 3D scene understanding from a single geo-tagged image , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).