Exploiting spatial-temporal constraints for interactive animation control

Interactive control of human characters would allow the intuitive control of characters in computer/video games, the control of avatars for virtual reality, electronically mediated communication or teleconferencing, and the rapid prototyping of character animations for movies. To be useful, such a system must be capable of controlling a lifelike character interactively, precisely, and intuitively. Building an animation system for home use is particularly challenging because the system should also be low cost and not require a considerable amount of time, skill, or artistry to assemble. This thesis explores an approach that exploits a number of different spatial-temporal constraints for interactive animation control. The control inputs from such a system will often be low dimensional, contain far less information than actual human motion. Thus they cannot be directly used for precise control of high-dimensional characters. However, natural human motion is highly constrained; the movements of the degrees of freedom of the limbs or facial expressions are not independent. Our hypothesis is that the knowledge about natural human motion embedded in a domain-specific motion capture database can be used to transform under-constrained user input into realistic human motions. The spatial-temporal coherence embedded in the motion data allows us to control high-dimensional human animations with low-dimensional user input. We demonstrate the power and flexibility of this approach through three different applications: controlling detailed three-dimensional (3D) facial expressions using a single video camera, controlling complex 3D full-body movements using two synchronized video cameras and a very small number of retro-reflective markers, and controlling realistic facial expressions or full-body motions using a sparse set of intuitive constraints defined throughout the motion. For all three systems, we assess the quality of the results by comparisons with those created by a commercial optical motion capture system. We demonstrate that the quality of the animation created by all three systems is comparable to commercial motion capture systems but requires less expense, time, and space to capture the user's input.

[1]  Keinosuke Fukunaga,et al.  An Algorithm for Finding Intrinsic Dimensionality of Data , 1971, IEEE Transactions on Computers.

[2]  Frederick I. Parke,et al.  Computer generated animation of faces , 1972, ACM Annual Conference.

[3]  Frederic I. Parke,et al.  A parametric model for human faces. , 1974 .

[4]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[5]  Lennart Ljung,et al.  System Identification: Theory for the User , 1987 .

[6]  Andrew P. Witkin,et al.  Spacetime constraints , 1988, SIGGRAPH.

[7]  Richard E. Parent,et al.  Layered construction for deformable animated characters , 1989, SIGGRAPH.

[8]  Demetri Terzopoulos,et al.  Physically-based facial modelling, analysis, and animation , 1990, Comput. Animat. Virtual Worlds.

[9]  Lance Williams,et al.  Performance-driven facial animation , 1990, SIGGRAPH.

[10]  Michael J. Black Robust incremental optical flow , 1992 .

[11]  Daniel Thalmann,et al.  Simulation of Facial Muscle Actions Based on Rational Free Form Deformations , 1992, Comput. Graph. Forum.

[12]  P. Anandan,et al.  Hierarchical Model-Based Motion Estimation , 1992, ECCV.

[13]  Michael M. Cohen,et al.  Modeling Coarticulation in Synthetic Visual Speech , 1993 .

[14]  Norman I. Badler,et al.  Real-Time Control of a Virtual Human Using Minimal Sensors , 1993, Presence: Teleoperators & Virtual Environments.

[15]  Gregory M. Nielson,et al.  Scattered data modeling , 1993, IEEE Computer Graphics and Applications.

[16]  Demetri Terzopoulos,et al.  Analysis and Synthesis of Facial Image Sequences Using Physical and Anatomical Models , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Carlo Tomasi,et al.  Good features to track , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Stephen M. Omohundro,et al.  Nonlinear Image Interpolation using Manifold Learning , 1994, NIPS.

[19]  Geoffrey E. Hinton,et al.  Recognizing Handwritten Digits Using Mixtures of Linear Models , 1994, NIPS.

[20]  Michael J. Black,et al.  Tracking and recognizing rigid and non-rigid facial motions using local parametric models of image motion , 1995, Proceedings of IEEE International Conference on Computer Vision.

[21]  Christopher M. Bishop,et al.  Neural Network for Pattern Recognition , 1995 .

[22]  Demetri Terzopoulos,et al.  Realistic modeling for facial animation , 1995, SIGGRAPH.

[23]  Keith Waters,et al.  Computer facial animation , 1996 .

[24]  Shang Guo,et al.  A high-level control mechanism for human locomotion based on parametric frame space interpolation , 1996 .

[25]  Alex Pentland,et al.  Modeling, tracking and interactive animation of faces and heads//using input from video , 1996, Proceedings Computer Animation '96.

[26]  Gang Xu,et al.  Epipolar Geometry in Stereo, Motion and Object Recognition , 1996, Computational Imaging and Vision.

[27]  Bernhard Schölkopf,et al.  Kernel Principal Component Analysis , 1997, ICANN.

[28]  J. Hahn,et al.  Interpolation Synthesis of Articulated Figure Motion , 1997, IEEE Computer Graphics and Applications.

[29]  Christoph Bregler,et al.  Video Rewrite: Driving Visual Speech with Audio , 1997, SIGGRAPH.

[30]  Timothy F. Cootes,et al.  Automatic Interpretation and Coding of Face Images Using Flexible Models , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[31]  David Salesin,et al.  Synthesizing realistic facial expressions from photographs , 1998, SIGGRAPH.

[32]  Paul A. Beardsley,et al.  Computer Vision for Interactive Computer Graphics , 1998, IEEE Computer Graphics and Applications.

[33]  Sudhanshu Kumar Semwal,et al.  Mapping Algorithms for Real-Time Control of an Avatar Using Eight Sensors , 1998, Presence.

[34]  Henrique S. Malvar,et al.  Making Faces , 2019, Topoi.

[35]  Michael F. Cohen,et al.  Verbs and Adverbs: Multidimensional Motion Interpolation , 1998, IEEE Computer Graphics and Applications.

[36]  Zoran Popovic,et al.  Physically based motion transformation , 1999, SIGGRAPH.

[37]  Vladimir Pavlovic,et al.  A dynamic Bayesian network approach to figure tracking using learned dynamic models , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[38]  Matthew Brand,et al.  Shadow puppetry , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[39]  Matthew Brand,et al.  Voice puppetry , 1999, SIGGRAPH.

[40]  David Salesin,et al.  Resynthesizing facial animation through 3D model-based tracking , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[41]  William T. Freeman,et al.  Bayesian Reconstruction of 3D Human Motion from Single-Camera Video , 1999, NIPS.

[42]  Zhengyou Zhang,et al.  Flexible camera calibration by viewing a plane from unknown orientations , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[43]  Marco La Cascia,et al.  Fast, reliable head tracking under varying illumination , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[44]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[45]  R. Bowden Learning Statistical Models of Human Motion , 2000 .

[46]  John P. Lewis,et al.  Pose Space Deformation: A Unified Approach to Shape Interpolation and Skeleton-Driven Deformation , 2000, SIGGRAPH.

[47]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[48]  Aaron Hertzmann,et al.  Style machines , 2000, SIGGRAPH 2000.

[49]  Vladimir Pavlovic,et al.  Learning Switching Linear Models of Human Motion , 2000, NIPS.

[50]  Jun-yong Noh,et al.  Expression cloning , 2001, SIGGRAPH.

[51]  David C. Hogg,et al.  Learning Variable-Length Markov Models of Behavior , 2001, Comput. Vis. Image Underst..

[52]  Peter-Pike J. Sloan,et al.  Artist‐Directed Inverse‐Kinematics Using Radial Basis Function Interpolation , 2001, Comput. Graph. Forum.

[53]  Radek Grzeszczuk,et al.  A data-driven model for monocular face tracking , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[54]  Sung Yong Shin,et al.  Computer puppetry: An importance-based approach , 2001, TOGS.

[55]  Walter Daelemans,et al.  TreeTalk: Memory-based word phonemisation , 2001 .

[56]  Jessica K. Hodgins,et al.  Interactive control of avatars animated with human motion data , 2002, SIGGRAPH.

[57]  Geoffrey E. Hinton,et al.  A Desktop Input Device and Interface for Interactive 3D Character Animation , 2002, Graphics Interface.

[58]  Christoph Bregler,et al.  Motion capture assisted animation: texturing and synthesis , 2002, ACM Trans. Graph..

[59]  Erika Chuang,et al.  Performance Driven Facial Animation using Blendshape Interpolation , 2002 .

[60]  Michael J. Black,et al.  Implicit Probabilistic Models of Human Motion for Synthesis and Tracking , 2002, ECCV.

[61]  Okan Arikan,et al.  Interactive motion generation from examples , 2002, ACM Trans. Graph..

[62]  Jing Xiao,et al.  Robust full-motion recovery of head by dynamic templates and re-registration techniques , 2002, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition.

[63]  Michael Gleicher,et al.  Evaluating video-based motion capture , 2002, Proceedings of Computer Animation 2002 (CA 2002).

[64]  C. Karen Liu,et al.  Synthesis of complex dynamic character motion from simple animations , 2002, ACM Trans. Graph..

[65]  Takeo Kanade,et al.  Shape-from-silhouette of articulated objects and its use for human body kinematics estimation and motion capture , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[66]  Mira Dontcheva,et al.  Layered acting for character animation , 2003, ACM Trans. Graph..

[67]  Nancy S. Pollard,et al.  Efficient synthesis of physically valid human motion , 2003, ACM Trans. Graph..

[68]  David A. Forsyth,et al.  Motion synthesis from annotations , 2003, ACM Trans. Graph..

[69]  Martial Hebert,et al.  Fully automatic registration of multiple 3D data sets , 2003, Image Vis. Comput..

[70]  Jing Xiao,et al.  Vision-based control of 3D facial animation , 2003, SCA '03.

[71]  Dinesh K. Pai,et al.  FootSee: an interactive animation system , 2003, SCA '03.

[72]  Neil D. Lawrence,et al.  Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data , 2003, NIPS.

[73]  Katsu Yamane,et al.  Natural Motion Animation through Constraining and Deconstraining at Will , 2003, IEEE Trans. Vis. Comput. Graph..

[74]  Katsu Yamane,et al.  Synthesizing animations of human manipulation tasks , 2004, ACM Trans. Graph..

[75]  Simon Baker,et al.  Lucas-Kanade 20 Years On: A Unifying Framework , 2004, International Journal of Computer Vision.

[76]  Andrew W. Moore,et al.  Locally Weighted Learning for Control , 1997, Artificial Intelligence Review.

[77]  Jessica K. Hodgins,et al.  Synthesizing physically realistic human motion in low-dimensional, behavior-specific spaces , 2004, ACM Trans. Graph..

[78]  C. Karen Liu,et al.  Momentum-based parameterization of dynamic character motion , 2004, SCA '04.

[79]  Aaron Hertzmann,et al.  Style-based inverse kinematics , 2004, ACM Trans. Graph..

[80]  Andrew W. Moore,et al.  Locally Weighted Learning , 1997, Artificial Intelligence Review.

[81]  Lucas Kovar,et al.  Automated extraction and parameterization of motions in large data sets , 2004, ACM Trans. Graph..

[82]  Li Zhang,et al.  Spacetime faces: high resolution capture for modeling and animation , 2004, SIGGRAPH 2004.

[83]  Tony Ezzat,et al.  Trainable videorealistic speech animation , 2002, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[84]  Dimitris N. Metaxas,et al.  Optical Flow Constraints on Deformable Models with Applications to Face Tracking , 2000, International Journal of Computer Vision.

[85]  Jing Xiao,et al.  A Closed-Form Solution to Non-rigid Shape and Motion Recovery , 2004, ECCV.

[86]  Paul A. Viola,et al.  Learning silhouette features for control of human motion , 2004, SIGGRAPH '04.

[87]  Stefano Soatto,et al.  Dynamic Textures , 2003, International Journal of Computer Vision.

[88]  Jovan Popović,et al.  Style translation for human motion , 2005, ACM Trans. Graph..

[89]  Jessica K. Hodgins,et al.  Analyzing the physical correctness of interpolated human motion , 2005, SCA '05.

[90]  Jessica K. Hodgins,et al.  Performance animation from low-dimensional control signals , 2005, ACM Trans. Graph..

[91]  C. Karen Liu,et al.  Learning physics-based motion style with nonlinear inverse optimization , 2005, ACM Trans. Graph..

[92]  David J. Fleet,et al.  Gaussian Process Dynamical Models , 2005, NIPS.

[93]  Tomohiko Mukai,et al.  Geostatistical motion interpolation , 2005, SIGGRAPH '05.

[94]  Daniel Thalmann,et al.  Abstract muscle action procedures for human face animation , 1988, The Visual Computer.

[95]  Jehee Lee,et al.  Motion patches: buildings blocks for virtual environments annotated with motion data , 2005, SIGGRAPH 2005.

[96]  Jovan Popovic,et al.  Adaptation of performed ballistic motion , 2005, TOGS.

[97]  Jehee Lee,et al.  Motion patches: building blocks for virtual environments annotated with motion data , 2006, ACM Trans. Graph..

[98]  David Salesin,et al.  Performance-driven hand-drawn animation , 2000, NPAR '00.

[99]  Katta G. Murty,et al.  Nonlinear Programming Theory and Algorithms , 2007, Technometrics.

[100]  Lucas Kovar,et al.  Motion graphs , 2002, SIGGRAPH '08.