Sparsity Invariant CNNs

In this paper, we consider convolutional neural networks operating on sparse inputs with an application to depth upsampling from sparse laser scan data. First, we show that traditional convolutional networks perform poorly when applied to sparse data even when the location of missing data is provided to the network. To overcome this problem, we propose a simple yet effective sparse convolution layer which explicitly considers the location of missing data during the convolution operation. We demonstrate the benefits of the proposed network architecture in synthetic and real experiments with respect to various baseline approaches. Compared to dense baselines, the proposed sparse convolution network generalizes well to novel datasets and is invariant to the level of sparsity in the data. For our evaluation, we derive a novel dataset from the KITTI benchmark, comprising 93k depth annotated RGB images. Our dataset allows for training and evaluating depth upsampling and depth prediction techniques in challenging real-world settings and will be made available upon publication.

[1]  Ben Graham,et al.  Sparse 3D convolutional neural networks , 2015, BMVC.

[2]  Carsten Rother,et al.  Depth Super Resolution by Rigid Body Self-Similarity in 3D , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Thomas Brox,et al.  A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Michael S. Brown,et al.  High quality depth map upsampling for 3D-TOF cameras , 2011, 2011 International Conference on Computer Vision.

[5]  Xiaoou Tang,et al.  Depth Map Super-Resolution by Deep Multi-Scale Guidance , 2016, ECCV.

[6]  Sebastian Thrun,et al.  An Application of Markov Random Fields to Range Sensing , 2005, NIPS.

[7]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Rob Fergus,et al.  Depth Map Prediction from a Single Image using a Multi-Scale Deep Network , 2014, NIPS.

[9]  Yiran Chen,et al.  Learning Structured Sparsity in Deep Neural Networks , 2016, NIPS.

[10]  Xueying Qin,et al.  Deep Depth Super-Resolution: Learning Depth Super-Resolution Using Deep Convolutional Neural Network , 2016, ACCV.

[11]  Heiko Hirschmüller,et al.  Stereo Processing by Semiglobal Matching and Mutual Information , 2008, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Marc Pollefeys,et al.  Semantically Guided Depth Upsampling , 2016, GCPR.

[13]  Ruigang Yang,et al.  Spatial-Depth Super Resolution for Range Images , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[15]  Klaus Diepold,et al.  Dense disparity maps from sparse disparity measurements , 2011, 2011 International Conference on Computer Vision.

[16]  Yiran Chen,et al.  Holistic SparseCNN: Forging the Trident of Accuracy, Speed, and Size , 2016, ArXiv.

[17]  Ming-Yu Liu,et al.  Joint Geodesic Upsampling of Depth Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  E. Nadaraya On Estimating Regression , 1964 .

[19]  Ji Wan,et al.  Multi-view 3D Object Detection Network for Autonomous Driving , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Qiang Qiu,et al.  Oriented Response Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  F. Michael,et al.  PerforatedCNNs: Acceleration through Elimination of Redundant Convolutions , 2016, ICLR 2016.

[22]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Max Welling,et al.  Group Equivariant Convolutional Networks , 2016, ICML.

[24]  Joachim M. Buhmann,et al.  TI-POOLING: Transformation-Invariant Pooling for Feature Learning in Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[26]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[27]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Kyoung Mu Lee,et al.  Accurate Image Super-Resolution Using Very Deep Convolutional Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Hassan Foroosh,et al.  Sparse Convolutional Neural Networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Xiaoou Tang,et al.  Learning a Deep Convolutional Network for Image Super-Resolution , 2014, ECCV.

[31]  G. S. Watson,et al.  Smooth regression analysis , 1964 .

[32]  Lior Wolf,et al.  InterpoNet, a Brain Inspired Neural Network for Optical Flow Dense Interpolation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Michal Irani,et al.  Super-resolution from a single image , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[35]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[36]  Gernot Riegler,et al.  OctNet: Learning Deep 3D Representations at High Resolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Xi Wang,et al.  High-Resolution Stereo Datasets with Subpixel-Accurate Ground Truth , 2014, GCPR.

[39]  Horst Bischof,et al.  Image Guided Depth Upsampling Using Anisotropic Total Generalized Variation , 2013, 2013 IEEE International Conference on Computer Vision.

[40]  Dani Lischinski,et al.  Joint bilateral upsampling , 2007, SIGGRAPH 2007.

[41]  Xiaoou Tang,et al.  Image Super-Resolution Using Deep Convolutional Networks , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Jianxiong Xiao,et al.  SUN RGB-D: A RGB-D scene understanding benchmark suite , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Bernhard Schölkopf,et al.  Mask-Specific Inpainting with Deep Neural Networks , 2014, GCPR.

[44]  Qiao Wang,et al.  VirtualWorlds as Proxy for Multi-object Tracking Analysis , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Daniel Cremers,et al.  FuseNet: Incorporating Depth into Semantic Segmentation via Fusion-Based CNN Architecture , 2016, ACCV.

[46]  Gabriel J. Brostow,et al.  Patch Based Synthesis for Single Depth Image Super-Resolution , 2012, ECCV.

[47]  Jonathan T. Barron,et al.  The Fast Bilateral Solver , 2015, ECCV.

[48]  Horst Bischof,et al.  ATGV-Net: Accurate Depth Super-Resolution , 2016, ECCV.

[49]  Benjamin Graham,et al.  Spatially-sparse convolutional neural networks , 2014, ArXiv.

[50]  Horst Bischof,et al.  A Deep Primal-Dual Network for Guided Depth Super-Resolution , 2016, BMVC.

[51]  Thomas S. Huang,et al.  Image Super-Resolution Via Sparse Representation , 2010, IEEE Transactions on Image Processing.

[52]  Ian D. Reid,et al.  Learning Depth from Single Monocular Images Using Deep Convolutional Neural Fields , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[53]  Peter V. Gehler,et al.  Learning Sparse High Dimensional Filters: Image Filtering, Dense CRFs and Bilateral Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Andreas Geiger,et al.  Object scene flow for autonomous vehicles , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Andrea Vedaldi,et al.  Warped Convolutions: Efficient Invariance to Spatial Transformations , 2016, ICML.

[56]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[57]  Antonio M. López,et al.  The SYNTHIA Dataset: A Large Collection of Synthetic Images for Semantic Segmentation of Urban Scenes , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Patrice Y. Simard,et al.  Best practices for convolutional neural networks applied to visual document analysis , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[59]  Thomas Brox,et al.  DeMoN: Depth and Motion Network for Learning Monocular Stereo , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[60]  Sebastian Thrun,et al.  Upsampling range data in dynamic environments , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[61]  Richard Szeliski,et al.  A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms , 2001, International Journal of Computer Vision.

[62]  Sebastian Thrun,et al.  A Noise‐aware Filter for Real‐time Depth Upsampling , 2008 .

[63]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[64]  Tian Xia,et al.  Vehicle Detection from 3D Lidar Using Fully Convolutional Network , 2016, Robotics: Science and Systems.

[65]  Stephan J. Garbin,et al.  Harmonic Networks: Deep Translation and Rotation Equivariance , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[66]  Rob Fergus,et al.  Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).