论文信息 - Volumetric performance capture from minimal camera viewpoints

Volumetric performance capture from minimal camera viewpoints

We present a convolutional autoencoder that enables high fidelity volumetric reconstructions of human performance to be captured from multi-view video comprising only a small set of camera views. Our method yields similar end-to-end reconstruction error to that of a probabilistic visual hull computed using significantly more (double or more) viewpoints. We use a deep prior implicitly learned by the autoencoder trained over a dataset of view-ablated multi-view video footage of a wide range of subjects and actions. This opens up the possibility of high-end volumetric performance capture in on-set and prosumer scenarios where time or cost prohibit a high witness camera count.

[1] Michal Irani,et al. Super-resolution from a single image , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[2] Zhen Li,et al. High-Resolution Shape Completion Using Deep Neural Networks for Global Structure and Local Geometry Inference , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[3] Yoshua Bengio,et al. Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[4] Hassan Foroosh,et al. Volumetric Super-Resolution of Multispectral Data , 2017, ArXiv.

[5] William T. Freeman,et al. Example-Based Super-Resolution , 2002, IEEE Computer Graphics and Applications.

[6] Vagia Tsiminaki,et al. High Resolution 3D Shape Texture from Multiple Videos , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[7] Cristian Sminchisescu,et al. Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8] Trevor Darrell,et al. A Bayesian approach to image-based visual hull reconstruction , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[9] Jürgen Schmidhuber,et al. Training Very Deep Networks , 2015, NIPS.

[10] Adrian Hilton,et al. Optimal Representation of Multiple View Video , 2014, BMVC.

[11] Jianxiong Xiao,et al. 3D ShapeNets: A deep representation for volumetric shapes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12] Alvaro Collet,et al. High-quality streamable free-viewpoint video , 2015, ACM Trans. Graph..

[13] Xiaoou Tang,et al. Image Super-Resolution Using Deep Convolutional Networks , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14] Adrian Hilton,et al. Surface Capture for Performance-Based Animation , 2007, IEEE Computer Graphics and Applications.

[15] Edmond Boyer,et al. Exact polyhedral visual hulls , 2003, BMVC.

[16] L. Rudin,et al. Nonlinear total variation based noise removal algorithms , 1992 .

[17] Raanan Fattal,et al. Image upsampling via imposed edge statistics , 2007, ACM Trans. Graph..

[18] Thomas S. Huang,et al. Deep Networks for Image Super-Resolution with Sparse Prior , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[19] J. Collomosse,et al. Total Capture: 3D Human Pose Estimation Fusing Video and Inertial Sensors , 2017, BMVC.

[20] Martin Klaudiny,et al. Global Non-rigid Alignment of Surface Sequences , 2013, International Journal of Computer Vision.

[21] Sebastian Nowozin,et al. Cascades of Regression Tree Fields for Image Restoration , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22] Eero P. Simoncelli,et al. Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[23] Yanning Zhang,et al. Single Image Super-resolution Using Deformable Patches , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[24] A. Laurentini,et al. The Visual Hull Concept for Silhouette-Based Image Understanding , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[25] Oliver Grau,et al. VConv-DAE: Deep Volumetric Shape Learning Without Object Labels , 2016, ECCV Workshops.

[26] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27] H. Sebastian Seung,et al. Natural Image Denoising with Convolutional Networks , 2008, NIPS.

[28] Daniel Rueckert,et al. Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29] Matthew D. Zeiler. ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[30] Charles Malleson,et al. Total Capture: 3D Human Pose Estimation Fusing Video and Inertial Sensors , 2017, BMVC.

[31] Adrian Hilton,et al. 4D video textures for interactive character appearance , 2014, Comput. Graph. Forum.

[32] William E. Lorensen,et al. Marching cubes: A high resolution 3D surface construction algorithm , 1987, SIGGRAPH.

[33] Adrian Hilton,et al. Surface-based Character Animation , 2015 .

[34] Talley J. Lambert,et al. Multifocus structured illumination microscopy for fast volumetric super-resolution imaging. , 2017, Biomedical optics express.

[35] Enhong Chen,et al. Image Denoising and Inpainting with Deep Neural Networks , 2012, NIPS.

[36] Adrian Hilton,et al. A Free-Viewpoint Video Renderer , 2009, J. Graphics, GPU, & Game Tools.

[37] Jean-Yves Guillemaut,et al. Joint Multi-Layer Segmentation and Reconstruction for Free-Viewpoint Video Applications , 2011, International Journal of Computer Vision.

[38] Jean-Yves Guillemaut,et al. 4D Temporally Coherent Light-Field Video , 2017, 2017 International Conference on 3D Vision (3DV).