Deep Learning for Automated Occlusion Edge Detection in RGB-D Frames

Occlusion edges correspond to range discontinuity in a scene from the point of view of the observer. Detection of occlusion edges is an important prerequisite for many machine vision and mobile robotic tasks. Although they can be extracted from range data, extracting them from images and videos would be extremely beneficial. We trained a deep convolutional neural network (CNN) to identify occlusion edges in images and videos with just RGB, RGB-D and RGB-D-UV inputs, where D stands for depth and UV stands for horizontal and vertical components of the optical flow field respectively. The use of CNN avoids hand-crafting of features for automatically isolating occlusion edges and distinguishing them from appearance edges. Other than quantitative occlusion edge detection results, qualitative results are provided to evaluate input data requirements and to demonstrate the trade-off between high resolution analysis and frame-level computation time that is critical for real-time robotics applications.

[1]  C. A. Burbeck,et al.  Occlusion edge blur: a cue to relative visual depth. , 1996, Journal of the Optical Society of America. A, Optics, image science, and vision.

[2]  Ralph Gross,et al.  Concurrent Object Recognition and Segmentation by Graph Partitioning , 2002, NIPS.

[3]  Paul Smith,et al.  Layered motion segmentation and depth ordering by tracking edges , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[5]  Martial Hebert,et al.  Occlusion Boundaries from Motion: Low-Level Detection and Mid-Level Reasoning , 2009, International Journal of Computer Vision.

[6]  Nicolas Le Roux,et al.  Representational Power of Restricted Boltzmann Machines and Deep Belief Networks , 2008, Neural Computation.

[7]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[8]  Óscar Martínez Mozos,et al.  A comparative evaluation of interest point detectors and local descriptors for visual SLAM , 2010, Machine Vision and Applications.

[9]  Ce Liu,et al.  Exploring new representations and applications for motion analysis , 2009 .

[10]  B. S. Manjunath,et al.  Probabilistic occlusion boundary detection on spatio-temporal lattices , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[11]  Geoffrey E. Hinton,et al.  Semantic hashing , 2009, Int. J. Approx. Reason..

[12]  Y-Lan Boureau,et al.  Learning Convolutional Feature Hierarchies for Visual Recognition , 2010, NIPS.

[13]  Andreas Birk,et al.  Online 3D SLAM by Registration of Large Planar Surface Segments and Closed Form Pose-Graph Relaxation , 2010 .

[14]  Cheng Wang,et al.  Real-Time Occlusion Handling in Augmented Reality Based on an Object Tracking Approach , 2010, Sensors.

[15]  Stefano Soatto,et al.  Detachable Object Detection with Efficient Model Selection , 2011, EMMCVPR.

[16]  Benjamin Bustos,et al.  Harris 3D: a robust extension of the Harris operator for interest point detection on 3D meshes , 2011, The Visual Computer.

[17]  Jitendra Malik,et al.  Occlusion boundary detection and figure/ground assignment from optical flow , 2011, CVPR 2011.

[18]  Yoshua Bengio,et al.  On the Expressive Power of Deep Architectures , 2011, ALT.

[19]  Ronald Parr,et al.  Textured occupancy grids for monocular localization without features , 2011, 2011 IEEE International Conference on Robotics and Automation.

[20]  Radu Bogdan Rusu,et al.  3D is here: Point Cloud Library (PCL) , 2011, 2011 IEEE International Conference on Robotics and Automation.

[21]  Tara N. Sainath,et al.  FUNDAMENTAL TECHNOLOGIES IN MODERN SPEECH RECOGNITION Digital Object Identifier 10.1109/MSP.2012.2205597 , 2012 .

[22]  Peter Kontschieder,et al.  Context-Sensitive Decision Forests for Object Detection , 2012, NIPS.

[23]  S. Palmer,et al.  A century of Gestalt psychology in visual perception: I. Perceptual grouping and figure-ground organization. , 2012, Psychological bulletin.

[24]  Nitish Srivastava,et al.  Multimodal learning with deep Boltzmann machines , 2012, J. Mach. Learn. Res..

[25]  Wolfram Burgard,et al.  A benchmark for the evaluation of RGB-D SLAM systems , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[26]  Takeshi Oishi,et al.  Reduction of contradictory partial occlusion in mixed reality by using characteristics of transparency perception , 2012, 2012 IEEE International Symposium on Mixed and Augmented Reality (ISMAR).

[27]  Truong Q. Nguyen,et al.  An Online Learning Approach to Occlusion Boundary Detection , 2012, IEEE Transactions on Image Processing.

[28]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[29]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition , 2012 .

[30]  Shiming Xiang,et al.  Vehicle Detection in Satellite Images by Parallel Deep Convolutional Neural Networks , 2013, 2013 2nd IAPR Asian Conference on Pattern Recognition.

[31]  Bin Chen,et al.  Feature Matching and Adaptive Prediction Models in an Object Tracking DDDAS , 2013, ICCS.

[32]  Huijun Gao,et al.  A Curve Evolution Approach for Unsupervised Segmentation of Images With Low Depth of Field , 2013, IEEE Transactions on Image Processing.

[33]  Yann LeCun,et al.  Indoor Semantic Segmentation using depth information , 2013, ICLR.

[34]  Li Bai,et al.  Efficient Minimum Error Bounded Particle Resampling L1 Tracker With Occlusion Detection , 2013, IEEE Transactions on Image Processing.

[35]  Rob Fergus,et al.  Depth Map Prediction from a Single Image using a Multi-Scale Deep Network , 2014, NIPS.

[36]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[37]  Erik Blasch,et al.  Dynamic Data-driven Application System (DDDAS) for Video Surveillance User Support , 2015, ICCS.

[38]  Zhuowen Tu,et al.  Holistically-Nested Edge Detection , 2015, ICCV.

[39]  Michael Giering,et al.  Multi-modal sensor registration for vehicle perception via deep neural networks , 2014, 2015 IEEE High Performance Extreme Computing Conference (HPEC).

[40]  Daniel Roggen,et al.  Deep Convolutional and LSTM Recurrent Neural Networks for Multimodal Wearable Activity Recognition , 2016, Sensors.

[41]  Soumik Sarkar,et al.  LLNet: A deep autoencoder approach to natural low-light image enhancement , 2015, Pattern Recognit..