Rendering Portraitures from Monocular Camera and Beyond

Shallow Depth-of-Field (DoF) is a desirable effect in photography which renders artistic photos. Usually, it requires single-lens reflex cameras and certain photography skills to generate such effects. Recently, dual-lens on cellphones is used to estimate scene depth and simulate DoF effects for portrait shots. However, this technique cannot be applied to photos already taken and does not work well for whole-body scenes where the subject is at a distance from the cameras. In this work, we introduce an automatic system that achieves portrait DoF rendering for monocular cameras. Specifically, we first exploit Convolutional Neural Networks to estimate the relative depth and portrait segmentation maps from a single input image. Since these initial estimates from a single input are usually coarse and lack fine details, we further learn pixel affinities to refine the coarse estimation maps. With the refined estimation, we conduct depth and segmentation-aware blur rendering to the input image with a Conditional Random Field and image matting. In addition, we train a spatially-variant Recursive Neural Network to learn and accelerate this rendering process. We show that the proposed algorithm can effectively generate portraitures with realistic DoF effects using one single input. Experimental results also demonstrate that our depth and segmentation estimation modules perform favorably against the state-of-the-art methods both quantitatively and qualitatively.

[1]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2]  Frédo Durand,et al.  Defocus Magnification , 2007, Comput. Graph. Forum.

[3]  Frédo Durand,et al.  Fourier depth of field , 2009, TOGS.

[4]  H. Seidel,et al.  Real-time lens blur effects and focus control , 2010, ACM Trans. Graph..

[5]  Deqing Sun,et al.  Learning to Super-Resolve Blurry Face and Text Images , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[6]  Weifeng Chen,et al.  Single-Image Depth Perception in the Wild , 2016, NIPS.

[7]  Campbell Fw A method for measuring the depth of field of the human eye. , 1954 .

[8]  Ali Farhadi,et al.  Deep3D: Fully Automatic 2D-to-3D Video Conversion with Deep Convolutional Neural Networks , 2016, ECCV.

[9]  Chenyang Liu,et al.  Temporal Attention Network for Action Proposal , 2018, 2018 25th IEEE International Conference on Image Processing (ICIP).

[10]  Timo Schairer,et al.  Realistic Depth Blur for Images with Range Data , 2009, Dyn3D.

[11]  Chi-Keung Tang,et al.  KNN Matting , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Nassir Navab,et al.  Deeper Depth Prediction with Fully Convolutional Residual Networks , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[13]  Rob Fergus,et al.  Depth Map Prediction from a Single Image using a Multi-Scale Deep Network , 2014, NIPS.

[14]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[15]  Oisin Mac Aodha,et al.  Unsupervised Monocular Depth Estimation with Left-Right Consistency , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Tieniu Tan,et al.  Early Hierarchical Contexts Learned by Convolutional Networks for Image Segmentation , 2014, 2014 22nd International Conference on Pattern Recognition.

[17]  Renjie Liao,et al.  Deep Edge-Aware Filters , 2015, ICML.

[18]  Iasonas Kokkinos,et al.  Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[19]  Jan Kautz,et al.  Learning Affinity via Spatial Propagation Networks , 2017, NIPS.

[20]  Rob Fergus,et al.  Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[21]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[22]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Yang Wang,et al.  Realistic rendering of bokeh effect based on optical aberrations , 2010, The Visual Computer.

[24]  Gustavo Carneiro,et al.  Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue , 2016, ECCV.

[25]  Jim X. Chen,et al.  Accurate Depth of Field Simulation in Real Time , 2007, Comput. Graph. Forum.

[26]  Junwei Han,et al.  DHSNet: Deep Hierarchical Saliency Network for Salient Object Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Ming-Hsuan Yang,et al.  Learning Recursive Filters for Low-Level Vision via a Hybrid Neural Network , 2016, ECCV.

[28]  Sylvain Paris,et al.  Automatic Portrait Segmentation for Image Stylization , 2016, Comput. Graph. Forum.

[29]  Rui Wang,et al.  Real‐time Depth of Field Rendering via Dynamic Light Field Generation and Filtering , 2010, Comput. Graph. Forum.

[30]  Marie-Pierre Jolly,et al.  Interactive graph cuts for optimal boundary & region segmentation of objects in N-D images , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[31]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Vibhav Vineet,et al.  Conditional Random Fields as Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[33]  Jonathan T. Barron,et al.  Fast bilateral-space stereo for synthetic defocus , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Xiangyu Xu,et al.  Motion Blur Kernel Estimation via Deep Learning. , 2018, IEEE transactions on image processing : a publication of the IEEE Signal Processing Society.