Social Interactive Human Video Synthesis

In this paper, we propose a computational model for social interaction between three people in a conversation, and demonstrate results using human video motion synthesis. We utilised semi-supervised computer vision techniques to label social signals between the people, like laughing, head nod and gaze direction. Data mining is used to deduce frequently occurring patterns of social signals between a speaker and a listener in both interested and not interested social scenarios, and the mined confidence values are used as conditional probabilities to animate social responses. The human video motion synthesis is done using an appearance model to learn a multivariate probability distribution, combined with a transition matrix to derive the likelihood of motion given a pose configuration. Our system uses social labels to more accurately define motion transitions and build a texture motion graph. Traditional motion synthesis algorithms are best suited to large human movements like walking and running, where motion variations are large and prominent. Our method focuses on generating more subtle human movement like head nods. The user can then control who speaks and the interest level of the individual listeners resulting in social interactive conversational agents.

[1]  Irfan A. Essa,et al.  Graphcut textures: image and video synthesis using graph cuts , 2003, ACM Trans. Graph..

[2]  A. Pentland A Computational Model of Social Signaling , 2006 .

[3]  N. Troje Decomposing biological motion: a framework for analysis and synthesis of human gait patterns. , 2002, Journal of vision.

[4]  Adrian Hilton,et al.  Realistic synthesis of novel human movements from a database of motion capture examples , 2000, Proceedings Workshop on Human Motion.

[5]  Dana H. Ballard,et al.  Computer Vision , 1982 .

[6]  Christoph Bregler,et al.  Synthesis of Cyclic Motions with Texture , 2002 .

[7]  Lale Akarun,et al.  Generating motion graphs from clusters of individual poses , 2009, 2009 24th International Symposium on Computer and Information Sciences.

[8]  David A. Forsyth,et al.  Motion synthesis from annotations , 2003, ACM Trans. Graph..

[9]  Philippe Beaudoin,et al.  Motion-motif graphs , 2008, SCA '08.

[10]  M. Argyle Bodily communication, 2nd ed. , 1988 .

[11]  Jessica K. Hodgins,et al.  Flow-based video synthesis and editing , 2004, SIGGRAPH 2004.

[12]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[13]  Michael Gleicher,et al.  Parametric motion graphs , 2007, SI3D.

[14]  Hyun Joon Shin,et al.  Fat graphs: constructing an interactive character with continuous controls , 2006, SCA '06.

[15]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[16]  Barry-John Theobald,et al.  Robust facial feature tracking using selected multi-resolution linear predictors , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[17]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[18]  Adrien Bousseau,et al.  Real-time rough refraction , 2011, SI3D.

[19]  Richard Bowden,et al.  A Generative Model for Motion Synthesis and Blending Using Probability Density Estimation , 2008, AMDO.

[20]  Richard Szeliski,et al.  Video textures , 2000, SIGGRAPH.

[21]  Alexei A. Efros,et al.  Texture synthesis by non-parametric sampling , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[22]  Kazuyoshi Takayama,et al.  Holographic interferometric visualization of the Richtmyer-Meshkov instability induced by cylindrical shock waves , 1999 .

[23]  Alex Pentland,et al.  A Computational Model of Social Signalin , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[24]  Jessica K. Hodgins,et al.  Interactive control of avatars animated with human motion data , 2002, SIGGRAPH.

[25]  Martin Szummer,et al.  Temporal texture modeling , 1996, Proceedings of 3rd IEEE International Conference on Image Processing.

[26]  Alfred Mertins,et al.  Frequency-Warping Invariant Features for Automatic Speech Recognition , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[27]  Richard Bowden,et al.  Real-time motion control using pose space probability density estimation , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[28]  Atsushi Nakazawa,et al.  Human video textures , 2009, I3D '09.

[29]  Lucas Kovar,et al.  Motion graphs , 2002, SIGGRAPH Classes.