High-Precision Depth Estimation with the 3D LiDAR and Stereo Fusion

We present a deep convolutional neural network (CNN) architecture for high-precision depth estimation by jointly utilizing sparse 3D LiDAR and dense stereo depth information. In this network, the complementary characteristics of sparse 3D LiDAR and dense stereo depth are simultaneously encoded in a boosting manner. Tailored to the LiDAR and stereo fusion problem, the proposed network differs from previous CNNs in the incorporation of a compact convolution module, which can be deployed with the constraints of mobile devices. As training data for the LiDAR and stereo fusion is rather limited, we introduce a simple yet effective approach for reproducing the raw KITTI dataset. The raw LiDAR scans are augmented by adapting an off-the-shelf stereo algorithm and a confidence measure. We evaluate the proposed network on the KITTI benchmark and data collected by our multi-sensor acquisition system. Experiments demonstrate that the proposed network generalizes across datasets and is significantly more accurate than various baseline approaches.

[1]  Kevin Nickels,et al.  Fusion of Lidar and Stereo Range for Mobile Robots , 2003 .

[2]  Ming-Yu Liu,et al.  Joint Geodesic Upsampling of Depth Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Daniel Cremers,et al.  LSD-SLAM: Large-Scale Direct Monocular SLAM , 2014, ECCV.

[4]  Jitendra Malik,et al.  View Synthesis by Appearance Flow , 2016, ECCV.

[5]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[6]  Xiaoou Tang,et al.  Image Super-Resolution Using Deep Convolutional Networks , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Sebastian Thrun,et al.  A Noise‐aware Filter for Real‐time Depth Upsampling , 2008 .

[8]  Heiko Hirschmüller,et al.  Stereo Processing by Semiglobal Matching and Mutual Information , 2008, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Alois Knoll,et al.  PM-Huber: PatchMatch with Huber Regularization for Stereo Matching , 2013, 2013 IEEE International Conference on Computer Vision.

[10]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[11]  Yann LeCun,et al.  Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches , 2015, J. Mach. Learn. Res..

[12]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[13]  Thomas Brox,et al.  A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Xiaoou Tang,et al.  Depth Map Super-Resolution by Deep Multi-Scale Guidance , 2016, ECCV.

[15]  Paul Newman,et al.  Real-time probabilistic fusion of sparse 3D LIDAR and dense stereo , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[16]  Michal Irani,et al.  Super-resolution from a single image , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[17]  Oisin Mac Aodha,et al.  Unsupervised Monocular Depth Estimation with Left-Right Consistency , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Andreas Geiger,et al.  Efficient Large-Scale Stereo Matching , 2010, ACCV.

[19]  Kwanghoon Sohn,et al.  Real-time rear obstacle detection using reliable disparity for driver assistance , 2016, Expert Syst. Appl..

[20]  Sebastian Thrun,et al.  Upsampling range data in dynamic environments , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[21]  Sanja Fidler,et al.  Monocular 3D Object Detection for Autonomous Driving , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Radu Horaud,et al.  High-resolution depth maps based on TOF-stereo fusion , 2012, 2012 IEEE International Conference on Robotics and Automation.

[23]  Takeo Kanade,et al.  Integrating LIDAR into Stereo for Fast and Improved Disparity Computation , 2011, 2011 International Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission.

[24]  Takeo Kanade,et al.  A Stereo Matching Algorithm with an Adaptive Window: Theory and Experiment , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  Andreas Geiger,et al.  Object scene flow for autonomous vehicles , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Santanu Chaudhury,et al.  Kinect-Variety Fusion: A Novel Hybrid Approach for Artifacts-Free 3DTV Content Generation , 2014, 2014 22nd International Conference on Pattern Recognition.

[27]  Julius Ziegler,et al.  StereoScan: Dense 3d reconstruction in real-time , 2011, 2011 IEEE Intelligent Vehicles Symposium (IV).

[28]  Marc Pollefeys,et al.  Semantically Guided Depth Upsampling , 2016, GCPR.

[29]  Carsten Rother,et al.  Depth Super Resolution by Rigid Body Self-Similarity in 3D , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Stefano Mattoccia,et al.  Learning from scratch a confidence measure , 2016, BMVC.

[31]  한보형,et al.  Learning Deconvolution Network for Semantic Segmentation , 2015 .

[32]  Horst Bischof,et al.  ATGV-Net: Accurate Depth Super-Resolution , 2016, ECCV.

[33]  Cristiano Premebida,et al.  Pedestrian detection combining RGB and dense LIDAR data , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[34]  Minh N. Do,et al.  Fast Guided Global Interpolation for Depth and Motion , 2016, ECCV.

[35]  Xiaoou Tang,et al.  Learning a Deep Convolutional Network for Image Super-Resolution , 2014, ECCV.