Dynamic Appearance Modelling from Minimal Cameras

We present a novel method for modelling dynamic texture appearance from a minimal set of cameras. Previous methods to capture the dynamic appearance of a human from multi-view video have relied on large, expensive camera setups, and typically store texture on a frame-by-frame basis. We fit a parameterised human body model to multi-view video from minimal cameras (as few as 3), and combine the partial texture observations from multiple viewpoints and frames in a learned framework to generate full-body textures with dynamic details given an input pose. Key to our method are our multi-band loss functions, which apply separate blending functions to the high and low spatial frequencies to reduce texture artefacts. We evaluate our method on a range of multi-view datasets, and show that our model is able to accurately produce full-body dynamic textures, even with only partial camera coverage. We demonstrate that our method outperforms other texture generation methods on minimal camera setups.

[1]  Hao Li,et al.  PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[2]  Hao Li,et al.  SiCloPe: Silhouette-Based Clothed People , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Edmond Boyer,et al.  Video Based Animation Synthesis with the Essential Graph , 2015, 2015 International Conference on 3D Vision.

[4]  Michael Werman,et al.  Multiresolution Textures from Image Sequences , 1997, IEEE Computer Graphics and Applications.

[5]  Adam Baumberg,et al.  Blending Images for Texturing 3D Models , 2002, BMVC.

[6]  Weihong Deng,et al.  Very deep convolutional neural network based image classification using small training sample size , 2015, 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR).

[7]  Victor Lempitsky,et al.  Textured Neural Avatars , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Alvaro Collet,et al.  High-quality streamable free-viewpoint video , 2015, ACM Trans. Graph..

[9]  Yi Yang,et al.  Self-Correction for Human Parsing , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Hao Li,et al.  paGAN: real-time avatars using dynamic textures , 2019, ACM Trans. Graph..

[11]  Marcus A. Magnor,et al.  Learning to Reconstruct People in Clothing From a Single RGB Camera , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Winston H. Hsu,et al.  Free-Form Video Inpainting With 3D Gated Convolution and Temporal PatchGAN , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[13]  Holly E. Rushmeier,et al.  High-Quality Texture Reconstruction from Multiple Scans , 2001, IEEE Trans. Vis. Comput. Graph..

[14]  Paul Debevec,et al.  Modeling and Rendering Architecture from Photographs , 1996, SIGGRAPH 1996.

[15]  Adrian Hilton,et al.  4D video textures for interactive character appearance , 2014, Comput. Graph. Forum.

[16]  Chongyang Ma,et al.  Deep Volumetric Video From Very Sparse Multi-view Performance Capture , 2018, ECCV.

[17]  Zhaolin Chen,et al.  3D Texture Mapping in Multi-view Reconstruction , 2012, ISVC.

[18]  Vladlen Koltun,et al.  Color map optimization for 3D reconstruction with consumer depth cameras , 2014, ACM Trans. Graph..

[19]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[20]  Hao Li,et al.  ARCH: Animatable Reconstruction of Clothed Humans , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Hans-Peter Seidel,et al.  Automatic generation of personalized human avatars from multi-view video , 2005, VRST '05.

[22]  Gerard Pons-Moll,et al.  360-Degree Textures of People in Clothing from a Single Image , 2019, 2019 International Conference on 3D Vision (3DV).

[23]  Christian Theobalt,et al.  MonoPerfCap , 2017, ACM Trans. Graph..

[24]  Marcus A. Magnor,et al.  Video Based Reconstruction of 3D People Models , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[25]  Christian Theobalt,et al.  LiveCap , 2018, ACM Trans. Graph..

[26]  Vagia Tsiminaki,et al.  Eigen Appearance Maps of Dynamic Shapes , 2016, ECCV.

[27]  Yaser Sheikh,et al.  Deep appearance models for face rendering , 2018, ACM Trans. Graph..

[28]  Yinghao Huang,et al.  Towards Accurate Marker-Less Human Shape and Pose Estimation over Time , 2017, 2017 International Conference on 3D Vision (3DV).

[29]  Harry Shum,et al.  Optimal texture map reconstruction from multiple views , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[30]  Li Fei-Fei,et al.  Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.

[31]  Michael J. Black,et al.  SMPL: A Skinned Multi-Person Linear Model , 2023 .

[32]  Jochen Wingbermühle,et al.  Automatic reconstruction of 3D objects using a mobile monoscopic camera , 1997, Proceedings. International Conference on Recent Advances in 3-D Digital Imaging and Modeling (Cat. No.97TB100134).

[33]  Ting-Chun Wang,et al.  Image Inpainting for Irregular Holes Using Partial Convolutions , 2018, ECCV.

[34]  Paolo Cignoni,et al.  Multiple Texture Stitching and Blending on 3D Objects , 1999, Rendering Techniques.

[35]  Marcus A. Magnor,et al.  Detailed Human Avatars from Monocular Video , 2018, 2018 International Conference on 3D Vision (3DV).

[36]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.