Automatic dense visual semantic mapping from street-level imagery

This paper describes a method for producing a semantic map from multi-view street-level imagery. We define a semantic map as an overhead, or bird's eye view of a region with associated semantic object labels, such as car, road and pavement. We formulate the problem using two conditional random fields. The first is used to model the semantic image segmentation of the street view imagery treating each image independently. The outputs of this stage are then aggregated over many images to form the input for our semantic map that is a second random field defined over a ground plane. Each image is related by a simple, yet effective, geometrical function that back projects a region from the street view image into the overhead ground plane map. We introduce, and make publicly available, a new dataset created from real world data. Our qualitative evaluation is performed on this data consisting of a 14.8 km track, and we also quantify our results on a representative subset.

[1]  Paul Newman,et al.  Navigating, Recognizing and Describing Urban Spaces With Vision and Lasers , 2009, Int. J. Robotics Res..

[2]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[3]  Wolfram Burgard,et al.  Improving robot navigation in structured outdoor environments by identifying vegetation from laser data , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[4]  Joachim Hertzberg,et al.  Towards semantic maps for mobile robots , 2008, Robotics Auton. Syst..

[5]  Sebastian Thrun,et al.  Self-supervised Monocular Road Detection in Desert Terrain , 2006, Robotics: Science and Systems.

[6]  Pushmeet Kohli,et al.  Dynamic Graph Cuts for Efficient Inference in Markov Random Fields , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[8]  Javier Civera,et al.  Towards semantic SLAM using a monocular camera , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[9]  Pushmeet Kohli,et al.  Associative hierarchical CRFs for object class image segmentation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[10]  Giulio Reina,et al.  Combining radar and vision for self-supervised ground segmentation in outdoor environments , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[11]  Roland Siegwart,et al.  BRISK: Binary Robust invariant scalable keypoints , 2011, 2011 International Conference on Computer Vision.

[12]  Olga Veksler,et al.  Fast Approximate Energy Minimization via Graph Cuts , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Paul Newman,et al.  A generative framework for fast urban labeling using spatial and temporal context , 2009, Auton. Robots.

[14]  Vibhav Vineet,et al.  Filter-Based Mean-Field Inference for Random Fields with Higher-Order Terms and Product Label-Spaces , 2012, ECCV.

[15]  Paul Newman,et al.  Fast Probabilistic Labeling of City Maps , 2008, Robotics: Science and Systems.

[16]  Marie-Pierre Jolly,et al.  Interactive Graph Cuts for Optimal Boundary and Region Segmentation of Objects in N-D Images , 2001, ICCV.

[17]  Luc Van Gool,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[18]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Sebastian Scherer,et al.  Perception for a river mapping robot , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[20]  Philip H. S. Torr,et al.  Combining Appearance and Structure from Motion Features for Road Scene Understanding , 2009, BMVC.

[21]  Antonio Criminisi,et al.  TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-class Object Recognition and Segmentation , 2006, ECCV.

[22]  Vincent Lepetit,et al.  BRIEF: Binary Robust Independent Elementary Features , 2010, ECCV.

[23]  Marie-Pierre Jolly,et al.  Interactive graph cuts for optimal boundary & region segmentation of objects in N-D images , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[24]  D. Fox,et al.  Classification and Semantic Mapping of Urban Environments , 2011, Int. J. Robotics Res..

[25]  W. F. Clocksin,et al.  Joint Optimization for Object Class Segmentation and Dense Stereo Reconstruction , 2011, International Journal of Computer Vision.

[26]  C. V. Jawahar,et al.  Scene Text Recognition using Higher Order Language Priors , 2009, BMVC.