Analysis and synthesis of multiview audio-visual dance figures

This paper presents a framework for audio-driven human body motion analysis and synthesis. The video is analyzed to capture the time-varying posture of the dancerpsilas body whereas the musical audio signal is processed to extract the beat information. The human body posture is extracted from multiview video information without any human intervention using a novel marker-based algorithm based on annealing particle filtering. Body movements of the dancer are characterized by a set of recurring semantic motion patterns, i.e., dance figures. Each dance figure is modeled in a supervised manner with a set of HMM (Hidden Markov Model) structures and the associated beat frequency. In synthesis, given an audio signal of a learned musical type, the motion parameters of the corresponding dance figures are synthesized via the trained HMM structures in synchrony with the input audio signal based on the estimated tempo information. Finally, the generated motion parameters are animated along with the musical audio using a graphical animation tool. Experimental results demonstrate the effectiveness of the proposed framework.

[1]  Shinji Miyazaki,et al.  Comparison of the performance of 3D camera systems , 1995 .

[2]  Harry Shum,et al.  Learning dynamic audio-visual mapping with input-output Hidden Markov models , 2006, IEEE Trans. Multim..

[3]  Ulas Bagci,et al.  Automatic Classification of Musical Genres Using Inter-Genre Similarity , 2007, IEEE Signal Processing Letters.

[4]  Matthew Brand,et al.  Voice puppetry , 1999, SIGGRAPH.

[5]  Tsuhan Chen,et al.  Audiovisual speech processing , 2001, IEEE Signal Process. Mag..

[6]  Ian D. Reid,et al.  Articulated Body Motion Capture by Stochastic Search , 2005, International Journal of Computer Vision.

[7]  A. Murat Tekalp,et al.  Prosody-Driven Head-Gesture Animation , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[8]  Jitendra Malik,et al.  Tracking people with twists and exponential maps , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[9]  A. Murat Tekalp,et al.  Combined Gesture-Speech Analysis and Speech Driven Gesture Synthesis , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[10]  A. Murat Tekalp,et al.  Estimation and Analysis of Facial Animation Parameter Patterns , 2007, 2007 IEEE International Conference on Image Processing.

[11]  Neil J. Gordon,et al.  A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking , 2002, IEEE Trans. Signal Process..

[12]  Miguel A. Alonso,et al.  Tempo And Beat Estimation Of Musical Signals , 2004, ISMIR.

[13]  Montse Pardàs,et al.  Towards a Bayesian Approach to Robust Finding Correspondences in Multiple View Geometry Environments , 2005, International Conference on Computational Science.