Exploring the Use of Skeletal Tracking for Cheaper Motion Graphs and On-Set Decision Making in Free-Viewpoint Video Production

In free-viewpoint video (FVV), the motion and surface appearance of a real-world performance is captured as an animated mesh. While this technology can produce high-fidelity recreations of actors, the required 3D reconstruction step has substantial processing demands. This means FVV experiences are currently expensive to produce, and the processing delay means on-set decisions are hampered by a lack of feedback. This work explores the possibility of using RGB-camera-based skeletal tracking to reduce the amount of content that must be 3D reconstructed, as well as aiding on-set decision making. One particularly relevant application is in the construction of Motion Graphs, where state-of-the-art techniques require large amounts of content to be 3D reconstructed before a graph can be built, resulting in large amounts of wasted processing effort. Here, we propose the use of skeletons to assess which clips of FVV content to process, resulting in substantial cost savings with a limited impact on performance accuracy. Additionally, we explore how this technique could be utilised on set to reduce the possibility of requiring expensive reshoots.

[1]  Edmond Boyer,et al.  Surface Motion Capture Animation Synthesis , 2019, IEEE Transactions on Visualization and Computer Graphics.

[2]  Lucas Kovar,et al.  Flexible automatic motion blending with registration curves , 2003, SCA '03.

[3]  Szymon Rusinkiewicz,et al.  Rotation Invariant Spherical Harmonic Representation of 3D Shape Descriptors , 2003, Symposium on Geometry Processing.

[4]  Adrian Hilton,et al.  Human motion synthesis from 3D video , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Jean-Yves Guillemaut,et al.  Interactive Animation of 4D Performance Capture , 2013, IEEE Transactions on Visualization and Computer Graphics.

[6]  Anthony Steed,et al.  Improving Free-Viewpoint Video Content Production Using RGB-Camera-Based Skeletal Tracking , 2020, 2020 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW).

[7]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Richard Szeliski,et al.  Video textures , 2000, SIGGRAPH.

[9]  Okan Arikan,et al.  Interactive motion generation from examples , 2002, ACM Trans. Graph..

[10]  Szymon Rusinkiewicz,et al.  Modeling by example , 2004, ACM Trans. Graph..

[11]  Christoph Bregler,et al.  Video Rewrite: Driving Visual Speech with Audio , 1997, SIGGRAPH.

[12]  Takeo Kanade,et al.  Constructing virtual worlds using dense stereo , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[13]  Remco C. Veltkamp,et al.  A survey of content based 3D shape retrieval methods , 2004, Proceedings Shape Modeling Applications, 2004..

[14]  Tony Tung,et al.  The Augmented Multiresolution Reeb Graph Approach for Content-based Retrieval of 3d Shapes , 2005, Int. J. Shape Model..

[15]  K HodginsJessica,et al.  Interactive control of avatars animated with human motion data , 2002 .

[16]  Martin Klaudiny,et al.  Global Non-rigid Alignment of Surface Sequences , 2013, International Journal of Computer Vision.

[17]  Adrian Hilton,et al.  Global temporal registration of multiple non-rigid surface sequences , 2011, CVPR 2011.

[18]  Adrian Hilton,et al.  Model-based multiple view reconstruction of people , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[19]  Jean-Yves Guillemaut,et al.  4D parametric motion graphs for interactive animation , 2012, I3D '12.

[20]  Hans-Peter Seidel,et al.  Performance capture from sparse multi-view video , 2008, ACM Trans. Graph..

[21]  Takeo Kanade,et al.  Virtualized Reality: Constructing Virtual Worlds from Real Scenes , 1997, IEEE Multim..

[22]  Hao Li,et al.  Tracking surfaces with evolving topology , 2012, ACM Trans. Graph..

[23]  Jitendra Malik,et al.  Recognizing action at a distance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[24]  Jessica K. Hodgins,et al.  Interactive control of avatars animated with human motion data , 2002, SIGGRAPH.

[25]  Michael Gleicher,et al.  Parametric motion graphs , 2007, SI3D.

[26]  Thomas A. Funkhouser,et al.  The Princeton Shape Benchmark , 2004, Proceedings Shape Modeling Applications, 2004..

[27]  Yoshihiko Nakamura,et al.  Video Motion Capture from the Part Confidence Maps of Multi-Camera Images by Spatiotemporal Filtering Using the Human Skeletal Model , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[28]  Adrian Hilton,et al.  Video-based character animation , 2005, SCA '05.

[29]  Alvaro Collet,et al.  High-quality streamable free-viewpoint video , 2015, ACM Trans. Graph..

[30]  Ioannis Pratikakis,et al.  PANORAMA: A 3D Shape Descriptor Based on Panoramic Views for Unsupervised 3D Object Retrieval , 2010, International Journal of Computer Vision.

[31]  Kiyoharu Aizawa,et al.  Motion Editing in 3D Video Database , 2006, Third International Symposium on 3D Data Processing, Visualization, and Transmission (3DPVT'06).

[32]  Steven Schwarcz,et al.  3D Human Pose Estimation from Deep Multi-View 2D Pose , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[33]  Tony Tung,et al.  Comparison of Skeleton and Non-Skeleton Shape Descriptors for 3D Video , 2010 .

[34]  Adrian Hilton,et al.  Surface Capture for Performance-Based Animation , 2007, IEEE Computer Graphics and Applications.

[35]  Adrian Hilton,et al.  Hybrid Skeletal-Surface Motion Graphs for Character Animation from 4D Performance Capture , 2015, TOGS.

[36]  Adrian Hilton,et al.  Realistic synthesis of novel human movements from a database of motion capture examples , 2000, Proceedings Workshop on Human Motion.

[37]  Alvaro Collet,et al.  Motion graphs for unstructured textured meshes , 2016, ACM Trans. Graph..

[38]  Slobodan Ilic,et al.  Free-form mesh tracking: A patch-based approach , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[39]  Atsushi Nakazawa,et al.  Human video textures , 2009, I3D '09.

[40]  Adrian Hilton,et al.  A Study of Shape Similarity for Temporal Surface Sequences of People , 2007, Sixth International Conference on 3-D Digital Imaging and Modeling (3DIM 2007).

[41]  Michael J. Black,et al.  SMPL: A Skinned Multi-Person Linear Model , 2023 .

[42]  Wojciech Matusik,et al.  Articulated mesh animation from multi-view silhouettes , 2008, ACM Trans. Graph..

[43]  Adrian Hilton,et al.  Shape Similarity for 3D Video Sequences of People , 2010, International Journal of Computer Vision.

[44]  Adrian Hilton,et al.  4D Match Trees for Non-rigid Surface Alignment , 2016, ECCV.

[45]  Hans-Peter Kriegel,et al.  3D Shape Histograms for Similarity Search and Classification in Spatial Databases , 1999, SSD.

[46]  Christian Rössl,et al.  Dense correspondence finding for parametrization-free animation reconstruction from video , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[47]  Ioannis Pratikakis,et al.  On the retrieval of 3D mesh sequences of human actions , 2015, Multimedia Tools and Applications.

[48]  Hans-Peter Seidel,et al.  Free-viewpoint video of human actors , 2003, ACM Trans. Graph..

[49]  Michael F. Cohen,et al.  Verbs and Adverbs: Multidimensional Motion Interpolation , 1998, IEEE Computer Graphics and Applications.

[50]  Takashi Matsuyama,et al.  Dynamic surface matching by geodesic mapping for 3D animation transfer , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[51]  Lucas Kovar,et al.  Motion Graphs , 2002, ACM Trans. Graph..