论文信息 - Multispectral Transfer Network: Unsupervised Depth Estimation for All-Day Vision

Multispectral Transfer Network: Unsupervised Depth Estimation for All-Day Vision

To understand the real-world, it is essential to perceive in all-day conditions including cases which are not suitable for RGB sensors, especially at night. Beyond these limitations, the innovation introduced here is a multispectral solution in the form of depth estimation from a thermal sensor without an additional depth sensor. Based on an analysis of multispectral properties and the relevance to depth predictions, we propose an efficient and novel multi-task framework called the Multispectral Transfer Network (MTN) to estimate a depth image from a single thermal image. By exploiting geometric priors and chromaticity clues, our model can generate a pixel-wise depth image in an unsupervised manner. Moreover, we propose a new type of multitask module called Interleaver as a means of incorporating the chromaticity and fine details of skip-connections into the depth estimation framework without sharing feature layers. Lastly, we explain a novel technical means of stably training and covering large disparities and extending thermal images to data-driven methods for all-day conditions. In experiments, we demonstrate the better performance and generalization of depth estimation through the proposed multispectral stereo dataset, including various driv-

[1] Daniel Cremers,et al. Large displacement optical flow computation withoutwarping , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[2] Scott Sorensen,et al. CATS: A Color and Thermal Stereo Benchmark , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Jitendra Malik,et al. Hypercolumns for object segmentation and fine-grained localization , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4] Nassir Navab,et al. Deeper Depth Prediction with Fully Convolutional Residual Networks , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[5] Ali Farhadi,et al. Deep3D: Fully Automatic 2D-to-3D Video Conversion with Deep Convolutional Neural Networks , 2016, ECCV.

[6] Jae Shin Yoon,et al. All-Day Visual Place Recognition : Benchmark Dataset and Baseline , 2015 .

[7] Rob Fergus,et al. Depth Map Prediction from a Single Image using a Multi-Scale Deep Network , 2014, NIPS.

[8] Michael S. Brown,et al. High quality depth map upsampling for 3D-TOF cameras , 2011, 2011 International Conference on Computer Vision.

[9] Trevor Darrell,et al. Learning with Side Information through Modality Hallucination , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10] Sang Uk Lee,et al. Joint Depth Map and Color Consistency Estimation for Stereo Images with Different Illuminations and Cameras , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11] Hiroshi Ishikawa,et al. Let there be color! , 2016, ACM Trans. Graph..

[12] Andreas Geiger,et al. Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[13] Angel Domingo Sappa,et al. Infrared Image Colorization Based on a Triplet DCGAN Architecture , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[14] Marc Pollefeys,et al. Pulling Things out of Perspective , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[15] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[16] Rob Fergus,et al. Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[17] Noah Snavely,et al. Unsupervised Learning of Depth and Ego-Motion from Video , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18] Alexei A. Efros,et al. Colorful Image Colorization , 2016, ECCV.

[19] Qiao Wang,et al. VirtualWorlds as Proxy for Multi-object Tracking Analysis , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20] Namil Kim,et al. Multispectral pedestrian detection: Benchmark dataset and baseline , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21] Jörg Stückler,et al. Semi-Supervised Deep Learning for Monocular Depth Map Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22] Yann LeCun,et al. Deep multi-scale video prediction beyond mean square error , 2015, ICLR.

[23] Shu Wang,et al. Multispectral Deep Neural Networks for Pedestrian Detection , 2016, BMVC.

[24] Angel Domingo Sappa,et al. A Visible-Thermal Fusion Based Monocular Visual Odometry , 2015, ROBOT.

[25] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[26] Jitendra Malik,et al. Cross Modal Distillation for Supervision Transfer , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27] Ian D. Reid,et al. Learning Depth from Single Monocular Images Using Deep Convolutional Neural Fields , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28] Andrew Zisserman,et al. Spatial Transformer Networks , 2015, NIPS.

[29] Oisin Mac Aodha,et al. Unsupervised Monocular Depth Estimation with Left-Right Consistency , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30] Camille Couprie,et al. Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31] Andrea Vedaldi,et al. MatConvNet: Convolutional Neural Networks for MATLAB , 2014, ACM Multimedia.

[32] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33] Namil Kim,et al. Pixel-Level Domain Transfer , 2016, ECCV.

[34] Kavita Bala,et al. Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35] Namil Kim,et al. Thermal Image Enhancement using Convolutional Neural Network , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[36] Sebastian Ramos,et al. The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37] Gustavo Carneiro,et al. Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue , 2016, ECCV.

[38] Angel Domingo Sappa,et al. Multispectral piecewise planar stereo using Manhattan-world assumption , 2013, Pattern Recognit. Lett..

[39] Iasonas Kokkinos,et al. UberNet: Training a Universal Convolutional Neural Network for Low-, Mid-, and High-Level Vision Using Diverse Datasets and Limited Memory , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40] Weifeng Chen,et al. Single-Image Depth Perception in the Wild , 2016, NIPS.

[41] Chunhua Shen,et al. Depth and surface normal estimation from monocular images using regression on deep features and hierarchical CRFs , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42] Yann LeCun,et al. Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches , 2015, J. Mach. Learn. Res..

[43] Hendrik P. A. Lensch,et al. Infrared Colorization Using Deep Convolutional Neural Networks , 2016, 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA).