VideoPoseVR: Authoring Virtual Reality Character Animations with Online Videos

We present VideoPoseVR, a video-based animation authoring workflow using online videos to author character animations in VR. It leverages the state-of-the-art deep learning approach to reconstruct 3D motions from online videos, caption the motions, and store them in a motion dataset. Creators can import the videos, search in the dataset, modify the motion timeline, and combine multiple motions from videos to author character animations in VR. We implemented a proof-of-concept prototype and conducted a user study to evaluate the feasibility of the video-based authoring approach as well as gather initial feedback of the prototype. The study results suggest that VideoPoseVR was easy to learn for novice users to author animations and enable rapid exploration of prototyping for applications such as entertainment, skills training, and crowd simulations.

[1]  D. Kragic,et al.  Dance Style Transfer with Cross-modal Transformer , 2022, 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV).

[2]  Jens Emil Grønbæk,et al.  Rapido: Prototyping Interactive AR Experiences through Programming by Demonstration , 2021, UIST.

[3]  Augusto Esteves,et al.  From A-Pose to AR-Pose: Animating Characters in Mobile AR , 2021, SIGGRAPH Happy Hour.

[4]  Florian Alt,et al.  SpatialProto: Exploring Real-World Motion Captures for Rapid Prototyping of Interactive Mixed Reality , 2021, CHI.

[5]  Ilya Sutskever,et al.  Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.

[6]  Eyal Ofek,et al.  MoveBox: Democratizing MoCap for the Microsoft Rocketbox Avatar Library , 2020, 2020 IEEE International Conference on Artificial Intelligence and Virtual Reality (AIVR).

[7]  Jaron Lanier,et al.  The Rocketbox Library and the Utility of Freely Available Rigged Avatars , 2020, Frontiers in Virtual Reality.

[8]  Hongbo Fu,et al.  PoseTween: Pose-driven Tween Animation , 2020, UIST.

[9]  Don Kimber,et al.  Reactive Video: Adaptive Video Playback Based on User Motion for Supporting Physical Activity , 2020, UIST.

[10]  Jun Rekimoto,et al.  SuppleView: Rotation-Based Browsing Method by Changing Observation Angle of View for an Actor in Existing Videos , 2020, AVI.

[11]  Michael J. Black,et al.  Monocular, One-stage, Regression of Multiple 3D People , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[12]  Kin Chung Kwan,et al.  ARAnimator , 2020 .

[13]  Hui Ye,et al.  ARAnimator: in-situ character animation in mobile AR with user-defined motion gestures , 2020, ACM Trans. Graph..

[14]  Issei Sato,et al.  Sequential gallery for interactive visual design optimization , 2020, ACM Trans. Graph..

[15]  Andrea Bunt,et al.  Creating Augmented and Virtual Reality Applications: Current Practices, Challenges, and Opportunities , 2020, CHI.

[16]  Rubaiat Habib Kazi,et al.  Pronto: Rapid Augmented Reality Video Prototyping Using Sketches and Enaction , 2020, CHI.

[17]  Michael Nebeling,et al.  XRDirector: A Role-Based Collaborative Immersive Authoring System , 2020, CHI.

[18]  S. Perrault,et al.  Pose Estimation for Facilitating Movement Learning from Online Videos , 2020, AVI.

[19]  Wilmot Li,et al.  Pose2Pose: pose selection and transfer for 2D character animation , 2020, IUI.

[20]  Jun Rekimoto,et al.  PoseAsQuery: Full-Body Interface for Repeated Observation of a Person in a Video with Ambiguous Pose Indexes and Performed Poses , 2020, AHs.

[21]  Andrea Stevenson Won,et al.  ReliveReality: Enabling Socially Reliving Experiences in Virtual Reality via a Single RGB camera , 2020, 2020 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW).

[22]  Kenny Mitchell,et al.  PoseMMR: A Collaborative Mixed Reality Authoring Tool for Character Animation , 2020, 2020 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW).

[23]  Michael J. Black,et al.  VIBE: Video Inference for Human Body Pose and Shape Estimation , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Daniel Saakes,et al.  Skeletonographer: Skeleton-based Digital Ethnography Tool , 2019, CSCW Companion.

[25]  Rubaiat Habib Kazi,et al.  MagicalHands: Mid-Air Hand Gestures for Animating in VR , 2019, UIST.

[26]  Song-Chun Zhu,et al.  DenseRaC: Joint 3D Pose and Shape Estimation by Dense Render-and-Compare , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[27]  Pan Zhang,et al.  Semantic human activity annotation tool using skeletonized surveillance videos , 2019, UbiComp/ISWC Adjunct.

[28]  Wei Wu,et al.  Design Assessment in Virtual and Mixed Reality Environments: Comparison of Novices and Experts , 2019, Journal of Construction Engineering and Management.

[29]  Kyoung Mu Lee,et al.  Camera Distance-Aware Top-Down Approach for 3D Multi-Person Pose Estimation From a Single RGB Image , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[30]  Mayra Donaji Barrera Machuca,et al.  The Effect of Spatial Ability on Immersive 3D Drawing , 2019, Creativity & Cognition.

[31]  Junjun Pan,et al.  Interactive animation generation of virtual characters using single RGB-D camera , 2019, The Visual Computer.

[32]  Tovi Grossman,et al.  Geppetto: Enabling Semantic Design of Expressive Robot Behaviors , 2019, CHI.

[33]  Nikolaus F. Troje,et al.  AMASS: Archive of Motion Capture As Surface Shapes , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[34]  Nanning Zheng,et al.  Semantics-Guided Neural Networks for Efficient Skeleton-Based Human Action Recognition , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Marc Christie,et al.  VR as a Content Creation Tool for Movie Previsualisation , 2019, 2019 IEEE Conference on Virtual Reality and 3D User Interfaces (VR).

[36]  Dan Mikami,et al.  VR-based Batter Training System with Motion Sensing and Performance Visualization , 2019, 2019 IEEE Conference on Virtual Reality and 3D User Interfaces (VR).

[37]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Jitendra Malik,et al.  Learning 3D Human Dynamics From Video , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Dario Pavllo,et al.  3D Human Pose Estimation in Video With Temporal Convolutions and Semi-Supervised Training , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Bodo Rosenhahn,et al.  Supplementary Material to: Recovering Accurate 3D Human Pose in The Wild Using IMUs and a Moving Camera , 2018 .

[41]  Pascal Fua,et al.  Mo2Cap2: Real-time Mobile 3D Motion Capture with a Cap-mounted Fisheye Camera , 2018, IEEE Transactions on Visualization and Computer Graphics.

[42]  David Picard,et al.  2D/3D Pose Estimation and Action Recognition Using Multitask Deep Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[43]  Cewu Lu,et al.  Pose Flow: Efficient Online Pose Tracking , 2018, BMVC.

[44]  Abhishek Sharma,et al.  Learning 3D Human Pose from Structure and Motion , 2017, ECCV.

[45]  James J. Little,et al.  Exploiting Temporal Information for 3D Human Pose Estimation , 2017, ECCV.

[46]  Devon Penney,et al.  Building an animation pipeline for VR stories , 2017, SIGGRAPH Talks.

[47]  James J. Little,et al.  A Simple Yet Effective Baseline for 3d Human Pose Estimation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[48]  Jeff Johnson,et al.  Billion-Scale Similarity Search with GPUs , 2017, IEEE Transactions on Big Data.

[49]  Deva Ramanan,et al.  3D Human Pose Estimation = 2D Pose Estimation + Matching , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Cewu Lu,et al.  RMPE: Regional Multi-person Pose Estimation , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[51]  Yaser Sheikh,et al.  Realtime Multi-person 2D Pose Estimation Using Part Affinity Fields , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Peter V. Gehler,et al.  Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image , 2016, ECCV.

[53]  Olga Sorkine-Hornung,et al.  Rig animation with a tangible and modular input device , 2016, UIST.

[54]  Cristian Sminchisescu,et al.  Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[55]  Eve Edelstein,et al.  CaveCAD: Architectural design in the CAVE , 2013, 2013 IEEE Symposium on 3D User Interfaces (3DUI).

[56]  Andrew W. Fitzgibbon,et al.  KinÊtre: animating the world with the human body , 2012, UIST.

[57]  Maneesh Agrawala,et al.  3D puppetry: a kinect-based interface for 3D animation , 2012, UIST.

[58]  Michael Nitsche,et al.  Cell Phone Puppets: Turning Mobile Phones into Performing Objects , 2012, ICEC.

[59]  Nicolas Roussel,et al.  1 € filter: a simple speed-based low-pass filter for noisy input in interactive systems , 2012, CHI.

[60]  Ronen I. Brafman,et al.  Designing with interactive example galleries , 2010, CHI.

[61]  Sung-yong Shin,et al.  Video-guided motion synthesis using example motions , 2006, TOGS.

[62]  Pascal Fua,et al.  XNect , 2019, ACM Trans. Graph..

[63]  Claudio Demartini,et al.  Immersive Virtual Reality-Based Interfaces for Character Animation , 2019, IEEE Access.

[64]  Frank Steinicke,et al.  AnimationVR - Interactive Controller-Based Animating in Virtual Reality , 2018, 2018 IEEE Conference on Virtual Reality and 3D User Interfaces (VR).

[65]  Cristian Sminchisescu,et al.  Deep Network for the Integrated 3D Sensing of Multiple People in Natural Images , 2018, NeurIPS.

[66]  Perttu Hämäläinen,et al.  Improving 3D Character Posing with a Gestural Interface , 2017, IEEE Computer Graphics and Applications.

[67]  Michael Gardner,et al.  Systems to Support Co-creative Collaboration in Mixed-Reality Environments , 2017 .

[68]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[69]  Michael J. Black,et al.  HumanEva: Synchronized Video and Motion Capture Dataset for Evaluation of Articulated Human Motion , 2006 .