GMM-based synchronization rules for HMM-based audio-visual laughter synthesis

In this paper we propose synchronization rules between acoustic and visual laughter synthesis systems. Previous works have addressed separately the acoustic and visual laughter synthesis following an HMM-based approach. The need of synchronization rules comes from the constraint that in laughter, HMM-based synthesis cannot be performed using a unified system where common transcriptions may be used as it has been shown to be the case for audio-visual speech synthesis. Therefore acoustic and visual models are trained independently without any synchronization constraints. In this work, we propose rules derived from the analysis of audio and visual laughter transcriptions in order to be able to generate a visual laughter transcriptions corresponding to an audio laughter data.

[1]  Frank K. Soong,et al.  Synthesizing visual speech trajectory with minimum generation error , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  P. Ekman,et al.  The expressive pattern of laughter , 2001 .

[3]  A. Bowman,et al.  Applied smoothing techniques for data analysis : the kernel approach with S-plus illustrations , 1999 .

[4]  Thierry Dutoit,et al.  Evaluation of HMM-based visual laughter synthesis , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  Thierry Dutoit,et al.  The AV-LASYN Database : A synchronous corpus of audio and 3D facial marker data for audio-visual laughter synthesis , 2014, LREC.

[6]  Keiichi Tokuda,et al.  HMM-based text-to-audio-visual speech synthesis , 2000, INTERSPEECH.

[7]  Thierry Dutoit,et al.  GMM-based synchronization rules for HMM-based audio-visual laughter synthesis , 2015, ACII.

[8]  Gérard Bailly,et al.  Lip-Synching Using Speaker-Specific Articulation, Shape and Appearance Models , 2009, EURASIP J. Audio Speech Music. Process..

[9]  Junichi Yamagishi,et al.  Speech-driven lip motion generation with a trajectory HMM , 2008, INTERSPEECH.

[10]  Gérard Bailly,et al.  Learning optimal audiovisual phasing for an HMM-based control model for facial animation , 2007, SSW.

[11]  Thierry Dutoit,et al.  Evaluation of HMM-based laughter synthesis , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[12]  Alexander Kain,et al.  High-resolution voice transformation , 2001 .

[13]  Gérard Chollet,et al.  Statistical Mapping Between Articulatory and Acoustic Data for an Ultrasound-Based Silent Speech Interface , 2011, INTERSPEECH.

[14]  Yannis Stylianou,et al.  Harmonic plus noise models for speech, combined with statistical methods, for speech and speaker modification , 1996 .

[15]  Keiichi Tokuda,et al.  Visual Speech Synthesis Based on Parameter Generation From HMM: Speech-Driven and Text-And-Speech-Driven Approaches , 1998, AVSP.

[16]  Thierry Dutoit,et al.  A Phonetic Analysis of Natural Laughter, for Use in Automatic Laughter Processing Systems , 2011, ACII.

[17]  Korin Richmond,et al.  Comparison of HMM and TMDN methods for lip synchronisation , 2010, INTERSPEECH.

[18]  Heiga Zen,et al.  The HMM-based speech synthesis system (HTS) version 2.0 , 2007, SSW.

[19]  P. Deb Finite Mixture Models , 2008 .

[20]  Thierry Dutoit,et al.  Arousal-Driven Synthesis of Laughter , 2014, IEEE Journal of Selected Topics in Signal Processing.

[21]  Keiichi Tokuda,et al.  Text-to-visual speech synthesis based on parameter generation from HMM , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[22]  Thierry Dutoit,et al.  Automatic Phonetic Transcription of Laughter and Its Application to Laughter Synthesis , 2013, 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction.

[23]  Michael Pucher,et al.  Joint Audiovisual Hidden Semi-Markov Model-Based Speech Synthesis , 2014, IEEE Journal of Selected Topics in Signal Processing.