Expressive talking avatar synthesis and animation

The talking avatar, an animated speaking virtual character with vivid human-like appearance and real or synthetic speech, has gradually shown its potential in applications involving human-computer intelligent interactions. The talking avatar has rich communication abilities to deliver verbal and nonverbal information by voice, tones, eye-contact, head motion and facial expressions, etc. Avatars are increasingly being used on a variety of electronic devices, such as computers, smart phones, pads, kiosks and game consoles. Avatars also can be found across many domains, such as technical support and customer service, communication aids, speech therapy, virtual reality, film special effects, education and training [6]. Specific applications may include a virtual storyteller for children, a virtual guider or presenter for personal or commercial website, a representative of user in computer games and a funny puppetry for computer-mediated human communications. It is clearly promising that talking avatars will become an expressive multimodal interface in human computer interaction.

[1]  Dong Yu,et al.  Deep Learning: Methods and Applications , 2014, Found. Trends Signal Process..

[2]  Hichem Sahli,et al.  Gibberish speech as a tool for the study of affective expressiveness for robotic agents , 2014, Multimedia Tools and Applications.

[3]  Zhigang Deng,et al.  Live Speech Driven Head-and-Eye Motion Generators , 2012, IEEE Transactions on Visualization and Computer Graphics.

[4]  Dongmei Jiang,et al.  Relevance units machine based dimensional and continuous speech emotion prediction , 2014, Multimedia Tools and Applications.

[5]  Lei Xie,et al.  Head motion synthesis from speech using deep neural networks , 2015, Multimedia Tools and Applications.

[6]  Frank K. Soong,et al.  HMM trajectory-guided sample selection for photo-realistic talking head , 2014, Multimedia Tools and Applications.

[7]  Bin Liu,et al.  User behavior fusion in dialog management with multi-modal history cues , 2015, Multimedia Tools and Applications.

[8]  Lei Xie,et al.  Realistic Mouth-Synching for Speech-Driven Talking Face Using Articulatory Modelling , 2007, IEEE Transactions on Multimedia.

[9]  Frank K. Soong,et al.  Text Driven 3D Photo-Realistic Talking Head , 2011, INTERSPEECH.

[10]  Tony Ezzat,et al.  Trainable videorealistic speech animation , 2002, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[11]  Lianhong Cai,et al.  Generating emphatic speech with hidden Markov model for expressive speech synthesis , 2014, Multimedia Tools and Applications.

[12]  Jörn Ostermann,et al.  Lifelike talking faces for interactive services , 2003, Proc. IEEE.

[13]  Kai Zhao,et al.  Acoustic to articulatory mapping with deep neural network , 2014, Multimedia Tools and Applications.

[14]  Haizhou Li,et al.  Exemplar-based voice conversion using joint nonnegative matrix factorization , 2015, Multimedia Tools and Applications.

[15]  Hichem Sahli,et al.  Recognition of facial actions and their temporal segments based on duration models , 2014, Multimedia Tools and Applications.

[16]  Lei Xie,et al.  A statistical parametric approach to video-realistic text-driven talking avatar , 2013, Multimedia Tools and Applications.

[17]  Keiichi Tokuda,et al.  Using speaker adaptive training to realize Mandarin-Tibetan cross-lingual speech synthesis , 2014, Multimedia Tools and Applications.