Reconstructing Animatable Categories from Videos

Building animatable 3D models is challenging due to the need for 3D scans, laborious registration, and manual rigging, which are difficult to scale to arbitrary categories. Recently, differentiable rendering provides a pathway to obtain high-quality 3D models from monocular videos, but these are limited to rigid categories or single instances. We present RAC that builds category 3D models from monocular videos while disentangling variations over instances and motion over time. Three key ideas are introduced to solve this problem: (1) specializing a skeleton to instances via optimization, (2) a method for latent space regularization that encourages shared structure across a category while maintaining instance details, and (3) using 3D background models to disentangle objects from the background. We show that 3D models of humans, cats, and dogs can be learned from 50-100 internet videos.

[1]  Chuan-Sheng Foo,et al.  MoDA: Modeling Deformable 3D Objects from Casual Videos , 2023, ArXiv.

[2]  A. Vedaldi,et al.  MagicPony: Learning Articulated 3D Animals in the Wild , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Shenlong Wang,et al.  CASA: Category-agnostic Skeletal Animal Reconstruction , 2022, NeurIPS.

[4]  Angjoo Kanazawa,et al.  TAVA: Template-free Animatable Volumetric Actors , 2022, ECCV.

[5]  S. Narasimhan,et al.  WALT: Watch And Learn 2D amodal representation from Time-lapse imagery , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Kostas Daniilidis,et al.  CaDeX: Learning Canonical Deformation Coordinate Space for Dynamic Surface Representation via Neural Homeomorphism , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Michael J. Black,et al.  BARC: Learning to Regress 3D Dog Shape from Images by Exploiting Breed Information , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  H. Bao,et al.  SelfRecon: Self Reconstruction Your Digital Avatar from Monocular Video , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Pratul P. Srinivasan,et al.  HumanNeRF: Free-viewpoint Rendering of Moving People from Monocular Video , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  A. Vedaldi,et al.  BANMo: Building Animatable 3D Neural Models from Many Casual Videos , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Orazio Gallo,et al.  Watch It Move: Unsupervised Discovery of 3D Joints for Re-Posing of Articulated Objects , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Michael J. Black,et al.  ICON: Implicit Clothed humans Obtained from Normals , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Shalini De Mello,et al.  Efficient Geometry-aware 3D Generative Adversarial Networks , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Jitendra Malik,et al.  Omnidata: A Scalable Pipeline for Making Multi-Task Mid-Level Vision Datasets from 3D Scans , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[15]  Lourdes Agapito,et al.  CodeNeRF: Disentangled Neural Radiance Fields for Object Categories , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[16]  A. Vedaldi,et al.  DOVE: Learning Deformable 3D Objects by Watching Videos , 2021, International Journal of Computer Vision.

[17]  Iasonas Kokkinos,et al.  To The Point: Correspondence-driven monocular 3D category reconstruction , 2021, NeurIPS.

[18]  Andrea Vedaldi,et al.  Discovering Relationships between Object Categories via Universal Canonical Maps , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Leonidas J. Guibas,et al.  HuMoR: 3D Human Motion Model for Robust Pose Estimation , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[20]  Varun Jampani,et al.  LASR: Learning Articulated Shape Reconstruction from a Monocular Video , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Ricardo Martin-Brualla,et al.  FiG-NeRF: Figure-Ground Neural Radiance Fields for 3D Object Category Modelling , 2021, 2021 International Conference on 3D Vision (3DV).

[22]  Stephen Lin,et al.  Neural Articulated Radiance Field , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[23]  Michael J. Black,et al.  SCANimate: Weakly Supervised Learning of Skinned Clothed Avatar Networks , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  V. Prisacariu,et al.  NeRF-: Neural Radiance Fields Without Known Camera Parameters , 2021, ArXiv.

[25]  Helge Rhodin,et al.  A-NeRF: Articulated Neural Radiance Fields for Learning Human Shape, Appearance, and Pose , 2021, NeurIPS.

[26]  Shubham Tulsiani,et al.  Shelf-Supervised Mesh Prediction in the Wild , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Xueting Li,et al.  Online Adaptation for Consistent Mesh Reconstruction in the Wild , 2020, NeurIPS.

[28]  Justus Thies,et al.  Dynamic Neural Radiance Fields for Monocular 4D Facial Avatar Reconstruction , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Jiajun Wu,et al.  pi-GAN: Periodic Implicit Generative Adversarial Networks for 3D-Aware Image Synthesis , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Francesc Moreno-Noguer,et al.  D-NeRF: Neural Radiance Fields for Dynamic Scenes , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Zhengqi Li,et al.  Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Jonathan T. Barron,et al.  Nerfies: Deformable Neural Radiance Fields , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[33]  Andreas Geiger,et al.  GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Andrea Vedaldi,et al.  Continuous Surface Embeddings , 2020, NeurIPS.

[35]  Kai Zhang,et al.  NeRF++: Analyzing and Improving Neural Radiance Fields , 2020, ArXiv.

[36]  J. Hodgins,et al.  MonoClothCap: Towards Temporally Coherent Clothing Capture from Monocular RGB Video , 2020, 2020 International Conference on 3D Vision (3DV).

[37]  Hao Li,et al.  Volumetric human teleportation , 2020, SIGGRAPH 2020.

[38]  Kostas Daniilidis,et al.  3D Bird Reconstruction: a Dataset, Model, and Shape Recovery from a Single View , 2020, ECCV.

[39]  Jonathan T. Barron,et al.  NeRF in the Wild: Neural Radiance Fields for Unconstrained Photo Collections , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Yaser Sheikh,et al.  Spatiotemporal Bundle Adjustment for Dynamic 3D Human Reconstruction in the Wild , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Roberto Cipolla,et al.  Who Left the Dogs Out? 3D Animal Reconstruction with Expectation Maximization in the Loop , 2020, ECCV.

[42]  Abhinav Gupta,et al.  Implicit Mesh Reconstruction from Unannotated Image Collections , 2020, ArXiv.

[43]  Tao Yu,et al.  PaMIR: Parametric Model-Conditioned Implicit Representation for Image-Based Human Reconstruction , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Jonathan T. Barron,et al.  Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains , 2020, NeurIPS.

[45]  E. Kalogerakis,et al.  RigNet , 2020, ACM Trans. Graph..

[46]  Hanbyul Joo,et al.  PIFuHD: Multi-Level Pixel-Aligned Implicit Function for High-Resolution 3D Human Digitization , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Pratul P. Srinivasan,et al.  NeRF , 2020, ECCV.

[48]  Jan Kautz,et al.  Self-supervised Single-view 3D Reconstruction via Semantic Consistency , 2020, ECCV.

[49]  Ross B. Girshick,et al.  PointRend: Image Segmentation As Rendering , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Michael J. Black,et al.  VIBE: Video Inference for Human Body Pose and Shape Estimation , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Michael J. Black,et al.  Three-D Safari: Learning to Estimate Zebra Pose, Shape, and Texture From Images “In the Wild” , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[52]  Hao Li,et al.  PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[53]  Thomas Brox,et al.  What Do Single-View 3D Reconstruction Networks Learn? , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  C. Theobalt,et al.  Tex2Shape: Detailed Full Human Body Geometry From a Single Image , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[55]  Dimitrios Tzionas,et al.  Expressive Body Capture: 3D Hands, Face, and Body From a Single Image , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Bernhard Schölkopf,et al.  From Variational to Deterministic Autoencoders , 2019, ICLR.

[57]  Sebastian Nowozin,et al.  Occupancy Networks: Learning 3D Reconstruction in Function Space , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Yaser Sheikh,et al.  Monocular Total Capture: Posing Face, Body, and Hands in the Wild , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Andrew W. Fitzgibbon,et al.  Creatures great and SMAL: Recovering the shape and motion of animals from video , 2018, ACCV.

[60]  Alain Trouvé,et al.  Interpolating between Optimal Transport and MMD using Sinkhorn Divergences , 2018, AISTATS.

[61]  Michael J. Black,et al.  Lions and Tigers and Bears: Capturing Non-rigid, 3D, Articulated Shape from Images , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[62]  Jitendra Malik,et al.  Learning Category-Specific Mesh Reconstruction from Image Collections , 2018, ECCV.

[63]  Christian Theobalt,et al.  MonoPerfCap , 2017, ACM Trans. Graph..

[64]  Michael J. Black,et al.  3D Menagerie: Modeling the 3D Shape and Pose of Animals , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[65]  Samy Bengio,et al.  Density estimation using Real NVP , 2016, ICLR.

[66]  Zhigang Deng,et al.  Robust and accurate skeletal rigging from mesh sequences , 2014, ACM Trans. Graph..

[67]  Michael J. Black,et al.  FAUST: Dataset and Evaluation for 3D Mesh Registration , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[68]  Olga Sorkine-Hornung,et al.  Stretchable and Twistable Bones for Skeletal Shape Deformation , 2011, ACM Trans. Graph..

[69]  Takeo Kanade,et al.  Background Subtraction for Freely Moving Cameras , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[70]  Wojciech Matusik,et al.  Articulated mesh animation from multi-view silhouettes , 2008, ACM Trans. Graph..

[71]  Jirí Zára,et al.  Skinning with dual quaternions , 2007, SI3D.

[72]  Dragomir Anguelov,et al.  SCAPE: shape completion and animation of people , 2005, ACM Trans. Graph..

[73]  William E. Lorensen,et al.  Marching cubes: A high resolution 3D surface construction algorithm , 1987, SIGGRAPH.

[74]  Ramesh C. Jain,et al.  On the Analysis of Accumulative Difference Pictures from Image Sequences of Real World Scenes , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[75]  D. Ramanan,et al.  ViSER: Video-Specific Surface Embeddings for Articulated 3D Shape Reconstruction , 2021, NeurIPS.

[76]  Deva Ramanan,et al.  Volumetric Correspondence Networks for Optical Flow , 2019, NeurIPS.