Real Time Indoor 3D Pipeline for an Advanced Sensory Substitution Device

In this paper, we present the indoor 3D pipeline of an assistive system for visually impaired people, whose goal is to scan the environment, extract information of interest and send it to the user through haptics and sounds. The particularities of indoor scenes, containing man-made objects, with many planar faces, led us to the idea of developing the 3D object recognition algorithms around a planar segmentation, based on normal vectors. The 3D pipeline starts with acquiring depth frames from a range camera and synchronized IMU data from an inertial sensor. The pre-processing stage computes normal vectors in the 3D points of the scanned environment and filters them to reduce the noise from the input data. The next stages are planar segmentation and object labeling, which divides the scene into ground, ceiling, walls and generic objects. The whole 3D pipeline works in real-time on a consumer laptop at approximately 15 fps. We describe each step of the pipeline, with the focus on the labeling stage, and present experimental results and ideas for further improvements.

[1]  Silvio Savarese,et al.  3D Scene Understanding by Voxel-CRF , 2013, 2013 IEEE International Conference on Computer Vision.

[2]  Denis Pellerin,et al.  Navigating from a Depth Image Converted into Sound , 2015, Applied bionics and biomechanics.

[3]  Thorsten Joachims,et al.  Contextually guided semantic labeling and search for three-dimensional point clouds , 2013, Int. J. Robotics Res..

[4]  Hong Liu,et al.  Segment and Label Indoor Scene Based on RGB-D for the Visually Impaired , 2014, MMM.

[5]  Dieter Fox,et al.  Unsupervised feature learning for 3D scene labeling , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[6]  Florica Moldoveanu,et al.  Time-consistent segmentation of indoor depth video frames , 2017, 2017 40th International Conference on Telecommunications and Signal Processing (TSP).

[7]  Anthony Cowley,et al.  Parsing Indoor Scenes Using RGB-D Imagery , 2012, Robotics: Science and Systems.

[8]  Dieter Fox,et al.  Detection-based object labeling in 3D scenes , 2012, 2012 IEEE International Conference on Robotics and Automation.

[9]  Longin Jan Latecki,et al.  Semantic Segmentation of RGBD Images with Mutex Constraints , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[10]  Rongrong Ji,et al.  Label Propagation from ImageNet to 3D Point Clouds , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Ben Taskar,et al.  Learning associative Markov networks , 2004, ICML.

[12]  Jitendra Malik,et al.  Indoor Scene Understanding with RGB-D Images: Bottom-up Segmentation, Object Detection and Semantic Segmentation , 2015, International Journal of Computer Vision.

[13]  Florica Moldoveanu,et al.  Kinect depth inpainting in real time , 2016, 2016 39th International Conference on Telecommunications and Signal Processing (TSP).

[14]  Claus Brenner,et al.  Object-level Segmentation of RGBD Data , 2014 .

[15]  Antonio Criminisi,et al.  Improving Indoor Mobility of the Visually Impaired with Depth-Based Spatial Sound , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[16]  Shichao Yang,et al.  Real-time 3D scene layout from a single image using Convolutional Neural Networks , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[17]  Guido Bologna,et al.  See ColOr: an extended sensory substitution device for the visually impaired , 2014 .

[18]  Jun Wang,et al.  iSee: obstacle detection and feedback system for the blind , 2015, UbiComp/ISWC Adjunct.

[19]  Irina Mocanu,et al.  Object Recognition with Kinect Sensor , 2015, 2015 20th International Conference on Control Systems and Computer Science.