Semantic octree: Unifying recognition, reconstruction and representation via an octree constrained higher order MRF

On the one hand, mainly within the computer vision community, multi-resolution image labelling problems with pixel, super-pixel and object levels, have made great progress towards the modelling of holistic scene understanding. On the other hand, mainly within the robotics and graphics communities, multi-resolution 3D representations of the world have matured to be efficient and accurate. In this paper we bring together the two hands and move towards the new direction of unified recognition, reconstruction and representation. We tackle the problem by embedding an octree into a hierarchical robust PN Markov Random Field. This allows us to jointly infer the multi-resolution 3D volume along with the object-class labels, all within the constraints of an octree data-structure. The octree representation is chosen as this data-structure is efficient for further processing such as dynamic updates, data compression, and surface reconstruction. We perform experiments in inferring our semantic octree on the The kitti Vision Benchmark Suite in order to demonstrate its efficacy.

[1]  Uwe Franke,et al.  Stixmentation - Probabilistic Stixel based Traffic Scene Labeling , 2012, BMVC.

[2]  Heiko Hirschmüller,et al.  Stereo Processing by Semiglobal Matching and Mutual Information , 2008, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  J. Wilhelms,et al.  Octrees for faster isosurface generation , 1992, TOGS.

[4]  Hu He,et al.  Nonparametric semantic segmentation for 3D street scenes , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[5]  Pushmeet Kohli,et al.  Associative hierarchical CRFs for object class image segmentation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[6]  Julius Ziegler,et al.  StereoScan: Dense 3d reconstruction in real-time , 2011, 2011 IEEE Intelligent Vehicles Symposium (IV).

[7]  Ali Shahrokni,et al.  Urban 3D semantic modelling using stereo vision , 2013, 2013 IEEE International Conference on Robotics and Automation.

[8]  Vibhav Vineet,et al.  ImageSpirit: Verbal Guided Image Parsing , 2013, ACM Trans. Graph..

[9]  Philip H. S. Torr,et al.  Scalable Cascade Inference for Semantic Image Segmentation , 2012, BMVC.

[10]  Antonio Criminisi,et al.  TextonBoost for Image Understanding: Multi-Class Object Recognition and Segmentation by Jointly Modeling Texture, Layout, and Context , 2007, International Journal of Computer Vision.

[11]  Wolfram Burgard,et al.  OctoMap: an efficient probabilistic 3D mapping framework based on octrees , 2013, Autonomous Robots.

[12]  Dietrich Paulus,et al.  Semantic 3D Octree Maps based on Conditional Random Fields , 2013, MVA.

[13]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Wolfram Burgard,et al.  Hierarchies of octrees for efficient 3D mapping , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[15]  Olga Veksler,et al.  Fast Approximate Energy Minimization via Graph Cuts , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Pushmeet Kohli,et al.  Robust Higher Order Potentials for Enforcing Label Consistency , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Donald Meagher,et al.  Geometric modeling using octree encoding , 1982, Computer Graphics and Image Processing.

[18]  Pushmeet Kohli,et al.  Inference Methods for CRFs with Co-occurrence Statistics , 2012, International Journal of Computer Vision.

[19]  Ali Shahrokni,et al.  Mesh Based Semantic Modelling for Indoor and Outdoor Scenes , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Gary J. Sullivan,et al.  High efficiency video coding: the next frontier in video compression [Standards in a Nutshell] , 2013, IEEE Signal Processing Magazine.

[21]  Ming Zeng,et al.  Octree-based fusion for realtime 3D reconstruction , 2013, Graph. Model..

[22]  Philip H. S. Torr,et al.  What, Where and How Many? Combining Object Detectors and CRFs , 2010, ECCV.

[23]  Silvio Savarese,et al.  3D Scene Understanding by Voxel-CRF , 2013, 2013 IEEE International Conference on Computer Vision.

[24]  James M. Rehg,et al.  Joint Semantic Segmentation and 3D Reconstruction from Monocular Video , 2014, ECCV.

[25]  Pushmeet Kohli,et al.  P³ & Beyond: Move Making Algorithms for Solving Higher Order Functions , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.