Deep View Morphing

Recently, convolutional neural networks (CNN) have been successfully applied to view synthesis problems. However, such CNN-based methods can suffer from lack of texture details, shape distortions, or high computational complexity. In this paper, we propose a novel CNN architecture for view synthesis called Deep View Morphing that does not suffer from these issues. To synthesize a middle view of two input images, a rectification network first rectifies the two input images. An encoder-decoder network then generates dense correspondences between the rectified images and blending masks to predict the visibility of pixels of the rectified images in the middle view. A view morphing network finally synthesizes the middle view using the dense correspondences and blending masks. We experimentally show the proposed method significantly outperforms the state-of-the-art CNN-based view synthesis method.

[1]  Amnon Shashua,et al.  Algebraic Functions For Recognition , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Takeo Kanade,et al.  Multi-PIE , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[3]  Frédo Durand,et al.  A gentle introduction to bilateral filtering and its applications , 2007, SIGGRAPH Courses.

[4]  Changchang Wu,et al.  Structure from Motion Using Structure-Less Resection , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[5]  Thaddeus Beier,et al.  Feature-based image metamorphosis , 1998 .

[6]  Steven M. Seitz,et al.  Single-view modelling of free-form scenes , 2002, Comput. Animat. Virtual Worlds.

[7]  Lance Williams,et al.  View Interpolation for Image Synthesis , 1993, SIGGRAPH.

[8]  John Flynn,et al.  Deep Stereo: Learning to Predict New Views from the World's Imagery , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Scott E. Reed,et al.  Weakly-supervised Disentangling with Recurrent Transformations for 3D View Synthesis , 2015, NIPS.

[10]  Thomas Brox,et al.  FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[11]  Steven M. Seitz,et al.  View morphing , 1996, SIGGRAPH.

[12]  Jan-Michael Frahm,et al.  PatchMatch Based Joint View Selection and Depthmap Estimation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Geoffrey E. Hinton,et al.  Transforming Auto-Encoders , 2011, ICANN.

[14]  Frank Chongwoo Park,et al.  A Geometric Particle Filter for Template-Based Visual Tracking , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Jan-Michael Frahm,et al.  Reconstructing the World* in Six Days *(As Captured by the Yahoo 100 Million Image Dataset) , 2015, CVPR 2015.

[16]  Thomas Brox,et al.  Learning to generate chairs with convolutional neural networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[18]  Ersin Yumer,et al.  Transformation-Grounded Image Generation Network for Novel 3D View Synthesis , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Alex Kuefler Deep View Morphing , 2016 .

[20]  Jitendra Malik,et al.  View Synthesis by Appearance Flow , 2016, ECCV.

[21]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[22]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[23]  Hideyuki Tamura,et al.  Viewpoint-dependent stereoscopic display using interpolation of multiviewpoint images , 1995, Electronic Imaging.

[24]  Tomaso A. Poggio,et al.  Model-based matching of line drawings by linear combinations of prototypes , 1995, Proceedings of IEEE International Conference on Computer Vision.

[25]  Marc Levoy,et al.  Light field rendering , 1996, SIGGRAPH.

[26]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[27]  Ken-ichi Anjyo,et al.  Tour into the picture: using a spidery mesh interface to make animation from a single image , 1997, SIGGRAPH.

[28]  Leonidas J. Guibas,et al.  ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[29]  Jean Ponce,et al.  Accurate, Dense, and Robust Multiview Stereopsis , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Tomaso A. Poggio,et al.  Linear Object Classes and Image Synthesis From a Single Example Image , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[31]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[32]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[33]  Xiaoou Tang,et al.  Video Frame Synthesis Using Deep Voxel Flow , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[34]  Richard Szeliski,et al.  The lumigraph , 1996, SIGGRAPH.

[35]  Thomas Brox,et al.  Multi-view 3D Models from Single Images with a Convolutional Network , 2015, ECCV.

[36]  Changchang Wu,et al.  Towards Linear-Time Incremental Structure from Motion , 2013, 2013 International Conference on 3D Vision.

[37]  Geoffrey E. Hinton,et al.  Transforming Autoencoders , 2011 .

[38]  Alexei A. Efros,et al.  Automatic photo pop-up , 2005, SIGGRAPH 2005.