Deep Material-Aware Cross-Spectral Stereo Matching

Cross-spectral imaging provides strong benefits for recognition and detection tasks. Often, multiple cameras are used for cross-spectral imaging, thus requiring image alignment, or disparity estimation in a stereo setting. Increasingly, multi-camera cross-spectral systems are embedded in active RGBD devices (e.g. RGB-NIR cameras in Kinect and iPhone X). Hence, stereo matching also provides an opportunity to obtain depth without an active projector source. However, matching images from different spectral bands is challenging because of large appearance variations. We develop a novel deep learning framework to simultaneously transform images across spectral bands and estimate disparity. A material-aware loss function is incorporated within the disparity prediction network to handle regions with unreliable matching such as light sources, glass windshields and glossy surfaces. No depth supervision is required by our method. To evaluate our method, we used a vehicle-mounted RGB-NIR stereo system to collect 13.7 hours of video data across a range of areas in and around a city. Experiments show that our method achieves strong performance and reaches real-time speed.

[1]  Mario Fritz,et al.  Improving the Kinect by Cross-Modal Stereo , 2011, BMVC.

[2]  Minh N. Do,et al.  DASC: Dense adaptive self-correlation descriptor for multi-modal and multi-spectral correspondence , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Antonio Torralba,et al.  SIFT Flow: Dense Correspondence across Scenes and Its Applications , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Noah Snavely,et al.  Unsupervised Learning of Depth and Ego-Motion from Video , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Oisin Mac Aodha,et al.  Unsupervised Monocular Depth Estimation with Left-Right Consistency , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Sang Uk Lee,et al.  Robust Stereo Matching Using Adaptive Normalized Cross-Correlation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Nicu Sebe,et al.  Learning Cross-Modal Deep Representations for Robust Pedestrian Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Iasonas Kokkinos,et al.  Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[9]  Toby P. Breckon,et al.  On Cross-Spectral Stereo Matching using Dense Gradient Features , 2012, BMVC.

[10]  Bojan Cukic,et al.  Cross-spectral face recognition in heterogeneous environments: A case study on matching visible to short-wave infrared imagery , 2011, 2011 International Joint Conference on Biometrics (IJCB).

[11]  Jörg Stückler,et al.  Semi-Supervised Deep Learning for Monocular Depth Map Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[13]  Kristin J. Dana,et al.  From photography to microbiology: Eigenbiome models for skin appearance , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[14]  Hong Zhang,et al.  Unsupervised Learning of Stereo Matching , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[15]  Sabine Süsstrunk,et al.  Automatic and Accurate Shadow Detection Using Near-Infrared Information , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Gustavo Carneiro,et al.  Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue , 2016, ECCV.

[17]  Sang Uk Lee,et al.  Joint Depth Map and Color Consistency Estimation for Stereo Images with Different Illuminations and Cameras , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[19]  Xiaopeng Zhang,et al.  Enhancing photographs with Near Infra-Red images , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[21]  Arun Ross,et al.  A study on using mid-wave infrared images for face recognition , 2012, Defense + Commercial Sensing.

[22]  Mark Hasegawa-Johnson,et al.  Stable and symmetric filter convolutional neural network , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[23]  In-So Kweon,et al.  Stereo Matching with Color and Monochrome Cameras in Low-Light Conditions , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  In So Kweon,et al.  RANUS: RGB and NIR Urban Scene Dataset for Deep Scene Parsing , 2018, IEEE Robotics and Automation Letters.

[25]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[26]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Jitendra Malik,et al.  Learning Rich Features from RGB-D Images for Object Detection and Segmentation , 2014, ECCV.

[28]  Luc Van Gool,et al.  Dynamic Filter Networks , 2016, NIPS.

[29]  Tieniu Tan,et al.  Learning Invariant Deep Representation for NIR-VIS Face Recognition , 2017, AAAI.

[30]  Namil Kim,et al.  Multispectral pedestrian detection: Benchmark dataset and baseline , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[32]  Qi Zhang,et al.  Multi-modal and Multi-spectral Registration for Natural Images , 2014, ECCV.

[33]  Guillermo Sapiro,et al.  Not Afraid of the Dark: NIR-VIS Face Recognition via Cross-Spectral Hallucination and Low-Rank Embedding , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Horst Bischof,et al.  Using Self-Contradiction to Learn Confidence Measures in Stereo Vision , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Sabine Süsstrunk,et al.  Multi-spectral SIFT for scene category recognition , 2011, CVPR 2011.

[36]  Chen Feng,et al.  Near-infrared guided color image dehazing , 2013, 2013 IEEE International Conference on Image Processing.

[37]  Luigi di Stefano,et al.  Unsupervised Adaptation for Deep Stereo , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[38]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[39]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[40]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[41]  Jian Sun,et al.  Guided Image Filtering , 2010, ECCV.

[42]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[43]  Stephen Lin,et al.  Deep Self-correlation Descriptor for Dense Cross-Modal Correspondence , 2016, ECCV.

[44]  Jianxiong Xiao,et al.  Tracking Revisited Using RGBD Camera: Unified Benchmark and Baselines , 2013, 2013 IEEE International Conference on Computer Vision.