Depth map prediction from a single image with generative adversarial nets

A depth map is a fundamental component of 3D construction. Depth map prediction from a single image is a challenging task in computer vision. In this paper, we consider the depth prediction as an image-to-image task and propose an adversarial convolutional architecture called the Depth Generative Adversarial Network (DepthGAN) for depth prediction. To enhance the image translation ability, we take advantage of a Fully Convolutional Residual Network (FCRN) and combine it with a generative adversarial network, which has shown remarkable achievements in image-to-image tasks. We also present a new loss function including the scale-invariant (SI) error and the structural similarity (SSIM) loss function to improve our model and to output a high-quality depth map. Experiments show that the DepthGAN performs better in monocular depth prediction than the current best method on the NYU Depth v2 dataset.

[1]  Alexei A. Efros,et al.  Putting Objects in Perspective , 2006, CVPR.

[2]  Xiaoou Tang,et al.  Single Image Haze Removal Using Dark Channel Prior , 2011 .

[3]  Guosheng Lin,et al.  Deep convolutional neural fields for depth estimation from a single image , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Huimin Lu,et al.  Underwater image de-scattering and classification by deep neural network , 2016, Comput. Electr. Eng..

[5]  Kang Zheng,et al.  Combining local appearance and holistic view: Dual-Source Deep Neural Networks for human pose estimation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[7]  Ashutosh Saxena,et al.  3-D Depth Reconstruction from a Single Still Image , 2007, International Journal of Computer Vision.

[8]  Huimin Lu,et al.  Motor Anomaly Detection for Unmanned Aerial Vehicles Using Reinforcement Learning , 2018, IEEE Internet of Things Journal.

[9]  Alan L. Yuille,et al.  Towards unified depth and semantic prediction from a single image , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Raymond Y. K. Lau,et al.  Least Squares Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[11]  Lucas Theis,et al.  Amortised MAP Inference for Image Super-resolution , 2016, ICLR.

[12]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Andrew Brock,et al.  Neural Photo Editing with Introspective Adversarial Networks , 2016, ICLR.

[14]  Lantao Yu,et al.  SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient , 2016, AAAI.

[15]  Xiang Bai,et al.  An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Huimin Lu,et al.  Underwater image dehazing using joint trilateral filter , 2014, Comput. Electr. Eng..

[17]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[18]  Pieter Abbeel,et al.  InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.

[19]  Sinisa Todorovic,et al.  Monocular Depth Estimation Using Neural Regression Forest , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Keeley Clayden,et al.  Personality, Motivation and Level of Involvement of Land-Based Recreationists in the Irish Uplands , 2012 .

[21]  Aykut Erdem,et al.  Learning to Generate Images of Outdoor Scenes from Attributes and Semantic Layouts , 2016, ArXiv.

[22]  Nassir Navab,et al.  Deeper Depth Prediction with Fully Convolutional Residual Networks , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[23]  Huchuan Lu,et al.  Defocus Blur Detection via Multi-stream Bottom-Top-Bottom Fully Convolutional Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24]  Rob Fergus,et al.  Depth Map Prediction from a Single Image using a Multi-Scale Deep Network , 2014, NIPS.

[25]  Chunhua Shen,et al.  Estimating Depth From Monocular Images as Classification Using Deep Fully Convolutional Residual Networks , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[26]  Huimin Lu,et al.  Low illumination underwater light field images reconstruction using deep convolutional neural networks , 2018, Future Gener. Comput. Syst..

[27]  Aimin Hao,et al.  Super-Resolution of Multi-Observed RGB-D Images Based on Nonlocal Regression and Total Variation , 2016, IEEE Transactions on Image Processing.

[28]  Christian Ledig,et al.  Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Zengfu Wang,et al.  A Close-Form Iterative Algorithm for Depth Inferring from a Single Image , 2010, ECCV.

[30]  Yoshua Bengio,et al.  Generative Adversarial Networks , 2014, ArXiv.

[31]  Philip Victor Harman,et al.  Rapid 2D-to-3D conversion , 2002, IS&T/SPIE Electronic Imaging.

[32]  Huimin Lu,et al.  Deep Context Convolutional Neural Networks for Semantic Segmentation , 2017, CCCV.

[33]  Stephen Gould,et al.  Single image depth estimation from predicted semantic labels , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[34]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[35]  Kunio Kashino,et al.  Generative Attribute Controller with Conditional Filtered Generative Adversarial Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Léon Bottou,et al.  Wasserstein GAN , 2017, ArXiv.

[37]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Rob Fergus,et al.  Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[39]  Alexei A. Efros,et al.  Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[40]  Yo-Sung Ho,et al.  Depth map estimation from single-view image using object classification based on Bayesian learning , 2010, 2010 3DTV-Conference: The True Vision - Capture, Transmission and Display of 3D Video.

[41]  Huimin Lu,et al.  Deep adversarial metric learning for cross-modal retrieval , 2019, World Wide Web.

[42]  Vassilios Morellas,et al.  Accurate 3D ground plane estimation from a single image , 2009, 2009 IEEE International Conference on Robotics and Automation.

[43]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[44]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[45]  Bin Li,et al.  Wound intensity correction and segmentation with convolutional neural networks , 2017, Concurr. Comput. Pract. Exp..

[46]  Huimin Lu,et al.  Brain Intelligence: Go beyond Artificial Intelligence , 2017, Mobile Networks and Applications.

[47]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[48]  Wolfram Burgard,et al.  3-D Mapping With an RGB-D Camera , 2014, IEEE Transactions on Robotics.

[49]  Yike Guo,et al.  Semantic Image Synthesis via Adversarial Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[50]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Huchuan Lu,et al.  Deep visual tracking: Review and experimental comparison , 2018, Pattern Recognit..