SteReFo: Efficient Image Refocusing with Stereo Vision

Whether to attract viewer attention to a particular object, give the impression of depth or simply reproduce human-like scene perception, shallow depth of field images are used extensively by professional and amateur photographers alike. To this end, high quality optical systems are used in DSLR cameras to focus on a specific depth plane while producing visually pleasing bokeh. We propose a physically motivated pipeline to mimic this effect from all-in-focus stereo images, typically retrieved by mobile cameras. It is capable to change the focal plane a posteriori at 76 FPS on KITTI images to enable real-time applications. As our portmanteau suggests, SteReFo interrelates stereo-based depth estimation and refocusing efficiently. In contrast to other approaches, our pipeline is simultaneously fully differentiable, physically motivated, and agnostic to scene content. It also enables computational video focus tracking for moving objects in addition to refocusing of static images. We evaluate our approach on the publicly available datasets SceneFlow, KITTI, CityScapes and quantify the quality of architectural changes.

[1]  Yinda Zhang,et al.  ActiveStereoNet: End-to-End Self-Supervised Learning for Active Stereo Systems , 2018, ECCV.

[2]  Antonios Gasteratos,et al.  Review of Stereo Vision Algorithms: From Software to Hardware , 2008 .

[3]  Jonathan T. Barron,et al.  A General and Adaptive Robust Loss Function , 2017, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Nassir Navab,et al.  Deeper Depth Prediction with Fully Convolutional Residual Networks , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[5]  Rob Fergus,et al.  Depth Map Prediction from a Single Image using a Multi-Scale Deep Network , 2014, NIPS.

[6]  Richard Szeliski,et al.  A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms , 2001, International Journal of Computer Vision.

[7]  Alex Kendall,et al.  End-to-End Learning of Geometry and Context for Deep Stereo Regression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[8]  Ian D. Reid,et al.  Real-Time Joint Semantic Segmentation and Depth Estimation Using Asymmetric Annotations , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[9]  Thomas Brox,et al.  A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Richard Szeliski,et al.  Building Rome in a day , 2009, ICCV.

[11]  Yair Movshovitz-Attias,et al.  Synthetic depth-of-field with a single-camera mobile phone , 2018, ACM Trans. Graph..

[12]  Luigi di Stefano,et al.  Real-Time Self-Adaptive Deep Stereo , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Marc Levoy,et al.  Light field rendering , 1996, SIGGRAPH.

[14]  Rynson W. H. Lau,et al.  Look Deeper into Depth: Monocular Depth Estimation with Semantic Booster and Attention-Driven Loss , 2018, ECCV.

[15]  Brian A. Barsky,et al.  Algorithms for rendering depth of field effects in computer graphics , 2008 .

[16]  Peter Shirley,et al.  Image space gathering , 2009, High Performance Graphics.

[17]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Radu Timofte,et al.  2018 PIRM Challenge on Perceptual Image Super-resolution , 2018, ArXiv.

[19]  Xiaogang Wang,et al.  Learning Monocular Depth by Distilling Cross-domain Stereo Networks , 2018, ECCV.

[20]  Oisin Mac Aodha,et al.  Unsupervised Monocular Depth Estimation with Left-Right Consistency , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Yochai Blau,et al.  The Perception-Distortion Tradeoff , 2017, CVPR.

[22]  Jonathan T. Barron,et al.  Fast bilateral-space stereo for synthetic defocus , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Alan C. Bovik,et al.  Making a “Completely Blind” Image Quality Analyzer , 2013, IEEE Signal Processing Letters.

[25]  Stefano Mattoccia,et al.  Learning Monocular Depth Estimation with Unsupervised Trinocular Assumptions , 2018, 2018 International Conference on 3D Vision (3DV).

[26]  Feng Li,et al.  Dynamic Depth of Field on Live Video Streams: A Stereo Solution , 2011 .

[27]  Sylvain Paris,et al.  Virtual DSLR: High Quality Dynamic Depth-of-Field Synthesis on Mobile Platforms , 2016, Digital Photography and Mobile Imaging.

[28]  Zhengqi Li,et al.  MegaDepth: Learning Single-View Depth Prediction from Internet Photos , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[29]  Jirí Zára,et al.  Fast depth of field rendering with surface splatting , 2003, Proceedings Computer Graphics International 2003.

[30]  Zhichao Yin,et al.  GeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[31]  David Präkel The Visual Dictionary of Photography , 2010 .

[32]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[33]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[34]  Nikolai Smolyanskiy,et al.  On the Importance of Stereo for Accurate Depth Estimation: An Efficient Semi-Supervised Deep Neural Network Approach , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[35]  Shahram Izadi,et al.  StereoNet: Guided Hierarchical Refinement for Real-Time Edge-Aware Depth Prediction , 2018, ECCV.

[36]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Olivier Faugeras,et al.  Motion and Structure from Motion in a piecewise Planar Environment , 1988, Int. J. Pattern Recognit. Artif. Intell..

[38]  Jonathan T. Barron,et al.  Aperture Supervision for Monocular Depth Estimation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[39]  Juan D. Tardós,et al.  ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo, and RGB-D Cameras , 2016, IEEE Transactions on Robotics.

[40]  Homer H. Chen,et al.  Occlusion-and-Edge-Aware Depth Estimation From Stereo Images for Synthetic Refocusing , 2018, 2018 IEEE International Conference on Multimedia & Expo Workshops (ICMEW).

[41]  D. Scharstein,et al.  A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms , 2001, Proceedings IEEE Workshop on Stereo and Multi-Baseline Vision (SMBV 2001).

[42]  Clément Viard,et al.  Image quality benchmark of computational bokeh , 2018 .

[43]  Noah Snavely,et al.  Unsupervised Learning of Depth and Ego-Motion from Video , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Luca Bertinetto,et al.  End-to-End Representation Learning for Correlation Filter Based Tracking , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Thomas S. Huang,et al.  Motion and structure from feature correspondences: a review , 1994, Proc. IEEE.

[46]  Jia-Bin Huang,et al.  DF-Net: Unsupervised Joint Learning of Depth and Flow using Cross-Task Consistency , 2018, ECCV.

[47]  Gabriel J. Brostow,et al.  Digging Into Self-Supervised Monocular Depth Estimation , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[48]  Huchuan Lu,et al.  DeepLens: Shallow Depth Of Field From A Single Image , 2018, ACM Trans. Graph..

[49]  ARNO KNAPITSCH,et al.  Tanks and temples , 2017, ACM Trans. Graph..

[50]  Alexei A. Efros,et al.  Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[51]  Hubert Nguyen,et al.  GPU Gems 3 , 2007 .

[52]  Ravi Ramamoorthi,et al.  Learning to Synthesize a 4D RGBD Light Field from a Single Image , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[53]  Li Sun,et al.  Learning Monocular Visual Odometry with Dense 3D Mapping from Dense 3D Flow , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[54]  Greg Humphreys,et al.  Physically Based Rendering: From Theory to Implementation , 2004 .

[55]  Andrew J. Davison,et al.  A benchmark for RGB-D visual odometry, 3D reconstruction and SLAM , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[56]  Martin Kraus,et al.  Depth‐of‐Field Rendering by Pyramidal Image Processing , 2007, Comput. Graph. Forum.

[57]  Kurt Akeley,et al.  The accumulation buffer: hardware support for high-quality rendering , 1990, SIGGRAPH.

[58]  Ian D. Reid,et al.  Unsupervised Learning of Monocular Depth Estimation and Visual Odometry with Deep Feature Reconstruction , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[59]  Ieee Xplore,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence Information for Authors , 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[60]  Dah-Jye Lee,et al.  Review of stereo vision algorithms and their suitability for resource-limited systems , 2013, Journal of Real-Time Image Processing.

[61]  Daniel Cremers,et al.  Direct Sparse Odometry , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.