HMM trajectory-guided sample selection for photo-realistic talking head

[1]  Zhen-Hua Ling,et al.  DNN-based unit selection using frame-sized speech segments , 2016, 2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP).

[2]  Gang Chen,et al.  Computer-Assisted Audiovisual Language Learning , 2012, Computer.

[3]  Matthew R. Scott,et al.  Towards a Specialized Search Engine for Language Learners [Point of View] , 2011 .

[4]  Frank K. Soong,et al.  A Sparse and Low-rank approach to efficient face alignment for photo-real talking head synthesis , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  Frank K. Soong,et al.  Synthesizing visual speech trajectory with minimum generation error , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  Frank K. Soong,et al.  Synthesizing photo-real talking head via trajectory-guided sample selection , 2010, INTERSPEECH.

[7]  Zhi-Jie Yan,et al.  RIch-context Unit Selection (RUS) approach to high quality TTS , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8]  Gérard Bailly,et al.  LIPS2008: visual speech synthesis challenge , 2008, INTERSPEECH.

[9]  Hichem Sahli,et al.  Multimodal Unit Selection for 2D Audiovisual Text-to-Speech Synthesis , 2008, MLMI.

[10]  Lianhong Cai,et al.  Head Movement Synthesis Based on Semantic and Prosodic Features for a Chinese Expressive Avatar , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[11]  Lei Xie,et al.  Realistic Mouth-Synching for Speech-Driven Talking Face Using Articulatory Modelling , 2007, IEEE Transactions on Multimedia.

[12]  Harry Shum,et al.  Real-Time Bayesian 3-D Pose Tracking , 2006, IEEE Transactions on Circuits and Systems for Video Technology.

[13]  Lei Xie,et al.  Speech Animation Using Coupled Hidden Markov Models , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[14]  Jörn Ostermann,et al.  Parameterization of Mouth Images by LLE and PCA for Image-Based Facial Animation , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[15]  Ren-Hua Wang,et al.  Minimum Generation Error Training for HMM-Based Speech Synthesis , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[16]  Scott A. King,et al.  Creating speech-synchronized animation , 2005, IEEE Transactions on Visualization and Computer Graphics.

[17]  Keiichi Tokuda,et al.  Spectral conversion based on maximum likelihood estimation considering global variance of converted parameter , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[18]  Gavin C. Cawley,et al.  Near-videorealistic synthetic talking faces: implementation and evaluation , 2004, Speech Commun..

[19]  Tien-Tsin Wong,et al.  A real-time Cantonese text-to-audiovisual speech synthesizer , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[20]  P. Pérez,et al.  Poisson image editing , 2003, ACM Trans. Graph..

[21]  Satoshi Nakamura,et al.  Statistical multimodal integration for audio-visual speech processing , 2002, IEEE Trans. Neural Networks.

[22]  Hans Peter Graf,et al.  Triphone based unit selection for concatenative visual speech synthesis , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[23]  Keiichi Tokuda,et al.  HMM-based text-to-audio-visual speech synthesis , 2000, INTERSPEECH.

[24]  Hans Peter Graf,et al.  Photo-Realistic Talking-Heads from Image Samples , 2000, IEEE Trans. Multim..

[25]  Robert E. Donovan,et al.  The IBM trainable speech synthesis system , 1998, ICSLP.

[26]  David Salesin,et al.  Synthesizing realistic facial expressions from photographs , 1998, SIGGRAPH.

[27]  Tony Ezzat,et al.  MikeTalk: a talking facial display based on morphing visemes , 1998, Proceedings Computer Animation '98 (Cat. No.98EX169).

[28]  Christoph Bregler,et al.  Video Rewrite: Driving Visual Speech with Audio , 1997, SIGGRAPH.

[29]  Alex Acero,et al.  Recent improvements on Microsoft's trainable text-to-speech system-Whistler , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[30]  Alan W. Black,et al.  Unit selection in a concatenative speech synthesis system using a large speech database , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[31]  Keiichi Tokuda,et al.  Speech synthesis using HMMs with dynamic features , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[32]  J. P. Lewis Fast Normalized Cross-Correlation , 2010 .

[33]  Keiichi Tokuda,et al.  An improved minimum generation error based model adaptation for HMM-based speech synthesis , 2009, INTERSPEECH.

[34]  Jörn Ostermann,et al.  Realistic facial animation system for interactive services , 2008, INTERSPEECH.

[35]  Tsuhan Chen,et al.  Audiovisual speech processing , 2001, IEEE Signal Process. Mag..

[36]  Matthew Turk,et al.  A Morphable Model For The Synthesis Of 3D Faces , 1999, SIGGRAPH.

[37]  Tony Ezzat,et al.  Mary101:a trainable videorealistic speech animation , 2022 .