论文信息 - Mononizing binocular videos

Mononizing binocular videos

This paper presents the idea of mono-nizing binocular videos and a framework to effectively realize it. Mono-nize means we purposely convert a binocular video into a regular monocular video with the stereo information implicitly encoded in a visual but nearly-imperceptible form. Hence, we can impartially distribute and show the mononized video as an ordinary monocular video. Unlike ordinary monocular videos, we can restore from it the original binocular video and show it on a stereoscopic display. To start, we formulate an encoding-and-decoding framework with the pyramidal deformable fusion module to exploit long-range correspondences between the left and right views, a quantization layer to suppress the restoring artifacts, and the compression noise simulation module to resist the compression noise introduced by modern video codecs. Our framework is self-supervised, as we articulate our objective function with loss terms defined on the input: a monocular term for creating the mononized video, an invertibility term for restoring the original video, and a temporal term for frame-to-frame coherence. Further, we conducted extensive experiments to evaluate our generated mononized videos and restored binocular videos for diverse types of images and 3D movies. Quantitative results on both standard metrics and user perception studies show the effectiveness of our method.

[1] Hans-Peter Seidel,et al. A perceptual model for disparity , 2011, ACM Trans. Graph..

[2] Hans-Peter Seidel,et al. GazeStereo3D: seamless disparity manipulations , 2016, ACM Trans. Graph..

[3] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[4] R. Stevenson,et al. DCT quantization noise in compressed images , 2005 .

[5] Ali Farhadi,et al. XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[6] Gary J. Sullivan,et al. Overview of the High Efficiency Video Coding (HEVC) Standard , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[7] Jia-Bin Huang,et al. 3D Photography Using Context-Aware Layered Depth Inpainting , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8] Zhou Wang,et al. Multiscale structural similarity for image quality assessment , 2003, The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003.

[9] Jing Liu,et al. 3D+2DTV: 3D displays with no ghosting for viewers without glasses , 2013, TOGS.

[10] Shumeet Baluja,et al. Hiding Images in Plain Sight: Deep Steganography , 2017, NIPS.

[11] Jungwon Lee,et al. Variable Rate Deep Image Compression With a Conditional Autoencoder , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[12] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13] Stephen Lin,et al. Deformable ConvNets V2: More Deformable, Better Results , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14] Zihan Wang,et al. HidingGAN: High Capacity Information Hiding with Generative Adversarial Network , 2019, Comput. Graph. Forum.

[15] Jonathan T. Barron,et al. Pushing the Boundaries of View Extrapolation With Multiplane Images , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16] Ivan Laptev,et al. Pose Estimation and Segmentation of People in 3D Movies , 2013, 2013 IEEE International Conference on Computer Vision.

[17] Houqiang Li,et al. Learning a Convolutional Neural Network for Image Compact-Resolution , 2019, IEEE Transactions on Image Processing.

[18] Jan Kautz,et al. PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19] Michel Barlaud,et al. Two deterministic half-quadratic regularization algorithms for computed imaging , 1994, Proceedings of 1st International Conference on Image Processing.

[20] Wei An,et al. Flickr1024: A Large-Scale Dataset for Stereo Image Super-Resolution , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[21] Ali Farhadi,et al. Deep3D: Fully Automatic 2D-to-3D Video Conversion with Deep Convolutional Neural Networks , 2016, ECCV.

[22] Kristin J. Dana,et al. Light Field Messaging With Deep Photographic Steganography , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23] Donald J. Schuirmann. A comparison of the Two One-Sided Tests Procedure and the Power Approach for assessing the equivalence of average bioavailability , 1987, Journal of Pharmacokinetics and Biopharmaceutics.

[24] Xuming He,et al. Geometry-Aware Deep Network for Single-Image Novel View Synthesis , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[25] Valero Laparra,et al. End-to-end Optimized Image Compression , 2016, ICLR.

[26] Hans-Peter Seidel,et al. A luminance-contrast-aware disparity model and applications , 2012, ACM Trans. Graph..

[27] Gary J. Sullivan,et al. Overview of the Stereo and Multiview Video Coding Extensions of the H.264/MPEG-4 AVC Standard , 2011, Proceedings of the IEEE.

[28] Thomas Brox,et al. CAM-Convs: Camera-Aware Multi-Scale Convolutions for Single-View Depth , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29] Debargha Mukherjee,et al. A Technical Overview of VP9—The Latest Open-Source Video Codec , 2013 .

[30] Luca Benini,et al. Soft-to-Hard Vector Quantization for End-to-End Learning Compressible Representations , 2017, NIPS.

[31] Aljoscha Smolic,et al. Nonlinear disparity mapping for stereoscopic 3D , 2010, SIGGRAPH 2010.

[32] Feng Liu,et al. 3D Ken Burns effect from a single image , 2019, ACM Trans. Graph..

[33] Yue Chen,et al. An Overview of Core Coding Tools in the AV1 Video Codec , 2018, 2018 Picture Coding Symposium (PCS).

[34] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[35] M. Gross,et al. Nonlinear disparity mapping for stereoscopic 3D , 2010, ACM Trans. Graph..

[36] Andrew Zisserman,et al. Spatial Transformer Networks , 2015, NIPS.

[37] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..

[38] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[39] Qian Wang,et al. Content-Based Scalable Multi-View Video Coding Using 4D Wavelet , 2017 .

[40] Graham Fyffe,et al. Stereo Magnification: Learning View Synthesis using Multiplane Images , 2018, ArXiv.

[41] Jonathan T. Barron,et al. NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis , 2020, ECCV.

[42] Shin'ya Nishida,et al. Hiding of phase-based stereo disparity for ghost-free viewing without glasses , 2017, ACM Trans. Graph..

[43] Andreas Geiger,et al. Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[44] Hao Gao,et al. Depth-Assisted Full Resolution Network for Single Image-Based View Synthesis , 2017, IEEE Computer Graphics and Applications.

[45] Yi Li,et al. Deformable Convolutional Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[46] Ran El-Yaniv,et al. Binarized Neural Networks , 2016, ArXiv.

[47] Ajay Luthra,et al. Overview of the H.264/AVC video coding standard , 2003, IEEE Trans. Circuits Syst. Video Technol..

[48] Hans-Peter Seidel,et al. Perceptual Real-Time 2D-to-3D Conversion Using Cue Fusion , 2018, IEEE Trans. Vis. Comput. Graph..

[49] John Flynn,et al. Deep Stereo: Learning to Predict New Views from the World's Imagery , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50] Wojciech Matusik,et al. Data Driven 2-D-to-3-D Video Conversion for Soccer , 2018, IEEE Transactions on Multimedia.

[51] Ying Chen,et al. Overview of the Multiview and 3D Extensions of High Efficiency Video Coding , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[52] Tien-Tsin Wong,et al. Invertible grayscale , 2018, ACM Trans. Graph..

[53] Ivan Laptev,et al. Pose Estimation and Segmentation of Multiple People in Stereoscopic Movies , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[54] Tae Hyun Kim,et al. Deep Multi-scale Convolutional Neural Network for Dynamic Scene Deblurring , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55] Anthony Vetro. Frame compatible formats for 3D video distribution , 2010, 2010 IEEE International Conference on Image Processing.

[56] Rob Fergus,et al. Depth Map Prediction from a Single Image using a Multi-Scale Deep Network , 2014, NIPS.

[57] Kalyan Sunkavalli,et al. Deep view synthesis from sparse photometric images , 2019, ACM Trans. Graph..

[58] William T. Freeman,et al. Learning the Depths of Moving People by Watching Frozen People , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[59] Yaser Sheikh,et al. Neural volumes , 2019, ACM Trans. Graph..

[60] Frédo Durand,et al. 3DTV at home , 2017, ACM Trans. Graph..

[61] Liang Lin,et al. Single View Stereo Matching , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[62] Toby P. Breckon,et al. Real-Time Monocular Depth Estimation Using Synthetic Data with Domain Adaptation via Image Style Transfer , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[63] Li Fei-Fei,et al. HiDDeN: Hiding Data With Deep Networks , 2018, ECCV.

[64] Bruhanth Mallik,et al. HEVC Based Multi-view Video Codec Using Frame Interleaving Technique , 2016, 2016 9th International Conference on Developments in eSystems Engineering (DeSE).

[65] Robert L. Stevenson,et al. DCT quantization noise in compressed images , 2001, IEEE Transactions on Circuits and Systems for Video Technology.