Learning to Produce 3D Media From a Captured 2D Video

Due to the advances in display technologies and commercial success of 3D motion pictures in recent years, there is renewed interest in enabling consumers to create 3D content. While new 3D content can be created using more advanced capture devices (i.e., stereo cameras), most people still own 2D capture devices. Further, enormously large collections of captured media exist only in 2D. We present a system for producing pseudo-stereo images from captured 2D videos. Our system employs a two-phase procedure where the first phase detects “good” pseudo-stereo images frames from a 2D video, which was captured a priori without any constraints on camera motion or content. We use a trained classifier to detect pairs of video frames that are suitable for constructing pseudo-stereo images. In particular, for a given frame at time t, we determine if exists such that It+t and It can form an acceptable pseudo-stereo image. Moreover, even if t is determined, generating a good pseudo-stereo image from 2D captured video frames can be nontrivial since in many videos, professional or amateur, both foreground and background objects may undergo complex motion. Independent foreground motions from different scene objects define different epipolar geometries that cause the conventional method of generating pseudo-stereo images to fail. To address this problem, the second phase of the proposed system further recomposes the frame pairs to ensure consistent 3D perception for objects for such cases. In this phase, final left and right pseudo-stereo images are created by recompositing different regions of the initial frame pairs to ensure a consistent camera geometry. We verify the performance of our method for producing pseudo-stereo media from captured 2D videos in a psychovisual evaluation using both professional movie clips and amateur home videos.

[1]  Barak Fishbain,et al.  Real-time 2D to 3D video conversion , 2007, Journal of Real-Time Image Processing.

[2]  B. Barenbrug,et al.  Improved depth propagation for 2D to 3D video conversion using key-frames , 2007 .

[3]  Sing Bing Kang,et al.  Depth Director: A System for Adding Depth to Movies , 2011, IEEE Computer Graphics and Applications.

[4]  S. Shankar Sastry,et al.  Two-View Multibody Structure from Motion , 2005, International Journal of Computer Vision.

[5]  Daniel Cohen-Or,et al.  Semi-automatic stereo extraction from video footage , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[6]  Qionghai Dai,et al.  A Novel Method for Semi-automatic 2D to 3D Video Conversion , 2008, 2008 3DTV Conference: The True Vision - Capture, Transmission and Display of 3D Video.

[7]  Daniel P. Huttenlocher,et al.  Efficient Graph-Based Image Segmentation , 2004, International Journal of Computer Vision.

[8]  Carlo Tomasi,et al.  Good features to track , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Liang Zhang,et al.  3D-TV Content Generation: 2D-to-3D Conversion , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[10]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[11]  Gunnar Farnebäck,et al.  Two-Frame Motion Estimation Based on Polynomial Expansion , 2003, SCIA.

[12]  Jianmin Jiang,et al.  A shape-match based algorithm for pseudo-3D conversion of 2D videos , 2005, IEEE International Conference on Image Processing 2005.

[13]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[14]  Ashutosh Saxena,et al.  Learning 3-D Scene Structure from a Single Still Image , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[15]  Cordelia Schmid,et al.  Actions in context , 2009, CVPR.

[16]  Richard Szeliski,et al.  A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms , 2001, International Journal of Computer Vision.

[17]  Ashutosh Saxena,et al.  Make3D: Learning 3D Scene Structure from a Single Still Image , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Nikos Komodakis,et al.  Approximate Labeling via Graph Cuts Based on Linear Programming , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Yueting Zhuang,et al.  Adaptive key frame extraction using unsupervised clustering , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[20]  Alexei A. Efros,et al.  Automatic photo pop-up , 2005, SIGGRAPH 2005.

[21]  Richard Szeliski,et al.  High-accuracy stereo depth maps using structured light , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[22]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[23]  Philip Victor Harman,et al.  Rapid 2D-to-3D conversion , 2002, IS&T/SPIE Electronic Imaging.