Multispectral Transfer Network: Unsupervised Depth Estimation for All-Day Vision

To understand the real-world, it is essential to perceive in all-day conditions including cases which are not suitable for RGB sensors, especially at night. Beyond these limitations, the innovation introduced here is a multispectral solution in the form of depth estimation from a thermal sensor without an additional depth sensor. Based on an analysis of multispectral properties and the relevance to depth predictions, we propose an efficient and novel multi-task framework called the Multispectral Transfer Network (MTN) to estimate a depth image from a single thermal image. By exploiting geometric priors and chromaticity clues, our model can generate a pixel-wise depth image in an unsupervised manner. Moreover, we propose a new type of multitask module called Interleaver as a means of incorporating the chromaticity and fine details of skip-connections into the depth estimation framework without sharing feature layers. Lastly, we explain a novel technical means of stably training and covering large disparities and extending thermal images to data-driven methods for all-day conditions. In experiments, we demonstrate the better performance and generalization of depth estimation through the proposed multispectral stereo dataset, including various driv-

[1]  Daniel Cremers,et al.  Large displacement optical flow computation withoutwarping , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[2]  Scott Sorensen,et al.  CATS: A Color and Thermal Stereo Benchmark , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Jitendra Malik,et al.  Hypercolumns for object segmentation and fine-grained localization , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Nassir Navab,et al.  Deeper Depth Prediction with Fully Convolutional Residual Networks , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[5]  Ali Farhadi,et al.  Deep3D: Fully Automatic 2D-to-3D Video Conversion with Deep Convolutional Neural Networks , 2016, ECCV.

[6]  Jae Shin Yoon,et al.  All-Day Visual Place Recognition : Benchmark Dataset and Baseline , 2015 .

[7]  Rob Fergus,et al.  Depth Map Prediction from a Single Image using a Multi-Scale Deep Network , 2014, NIPS.

[8]  Michael S. Brown,et al.  High quality depth map upsampling for 3D-TOF cameras , 2011, 2011 International Conference on Computer Vision.

[9]  Trevor Darrell,et al.  Learning with Side Information through Modality Hallucination , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Sang Uk Lee,et al.  Joint Depth Map and Color Consistency Estimation for Stereo Images with Different Illuminations and Cameras , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Hiroshi Ishikawa,et al.  Let there be color! , 2016, ACM Trans. Graph..

[12]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[13]  Angel Domingo Sappa,et al.  Infrared Image Colorization Based on a Triplet DCGAN Architecture , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[14]  Marc Pollefeys,et al.  Pulling Things out of Perspective , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[16]  Rob Fergus,et al.  Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[17]  Noah Snavely,et al.  Unsupervised Learning of Depth and Ego-Motion from Video , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Alexei A. Efros,et al.  Colorful Image Colorization , 2016, ECCV.

[19]  Qiao Wang,et al.  VirtualWorlds as Proxy for Multi-object Tracking Analysis , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Namil Kim,et al.  Multispectral pedestrian detection: Benchmark dataset and baseline , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Jörg Stückler,et al.  Semi-Supervised Deep Learning for Monocular Depth Map Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Yann LeCun,et al.  Deep multi-scale video prediction beyond mean square error , 2015, ICLR.

[23]  Shu Wang,et al.  Multispectral Deep Neural Networks for Pedestrian Detection , 2016, BMVC.

[24]  Angel Domingo Sappa,et al.  A Visible-Thermal Fusion Based Monocular Visual Odometry , 2015, ROBOT.

[25]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[26]  Jitendra Malik,et al.  Cross Modal Distillation for Supervision Transfer , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Ian D. Reid,et al.  Learning Depth from Single Monocular Images Using Deep Convolutional Neural Fields , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[29]  Oisin Mac Aodha,et al.  Unsupervised Monocular Depth Estimation with Left-Right Consistency , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Camille Couprie,et al.  Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Andrea Vedaldi,et al.  MatConvNet: Convolutional Neural Networks for MATLAB , 2014, ACM Multimedia.

[32]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Namil Kim,et al.  Pixel-Level Domain Transfer , 2016, ECCV.

[34]  Kavita Bala,et al.  Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Namil Kim,et al.  Thermal Image Enhancement using Convolutional Neural Network , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[36]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Gustavo Carneiro,et al.  Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue , 2016, ECCV.

[38]  Angel Domingo Sappa,et al.  Multispectral piecewise planar stereo using Manhattan-world assumption , 2013, Pattern Recognit. Lett..

[39]  Iasonas Kokkinos,et al.  UberNet: Training a Universal Convolutional Neural Network for Low-, Mid-, and High-Level Vision Using Diverse Datasets and Limited Memory , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Weifeng Chen,et al.  Single-Image Depth Perception in the Wild , 2016, NIPS.

[41]  Chunhua Shen,et al.  Depth and surface normal estimation from monocular images using regression on deep features and hierarchical CRFs , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Yann LeCun,et al.  Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches , 2015, J. Mach. Learn. Res..

[43]  Hendrik P. A. Lensch,et al.  Infrared Colorization Using Deep Convolutional Neural Networks , 2016, 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA).