论文信息 - HMM trajectory-guided sample selection for photo-realistic talking head - 字舞流文

HMM trajectory-guided sample selection for photo-realistic talking head

Lijuan Wang | F. Soong

[1] Zhen-Hua Ling,et al. DNN-based unit selection using frame-sized speech segments , 2016, 2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP).

[2] Gang Chen,et al. Computer-Assisted Audiovisual Language Learning , 2012, Computer.

[3] Matthew R. Scott,et al. Towards a Specialized Search Engine for Language Learners [Point of View] , 2011 .

[4] Frank K. Soong,et al. A Sparse and Low-rank approach to efficient face alignment for photo-real talking head synthesis , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5] Frank K. Soong,et al. Synthesizing visual speech trajectory with minimum generation error , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6] Frank K. Soong,et al. Synthesizing photo-real talking head via trajectory-guided sample selection , 2010, INTERSPEECH.

[7] Zhi-Jie Yan,et al. RIch-context Unit Selection (RUS) approach to high quality TTS , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8] Gérard Bailly,et al. LIPS2008: visual speech synthesis challenge , 2008, INTERSPEECH.

[9] Hichem Sahli,et al. Multimodal Unit Selection for 2D Audiovisual Text-to-Speech Synthesis , 2008, MLMI.

[10] Lianhong Cai,et al. Head Movement Synthesis Based on Semantic and Prosodic Features for a Chinese Expressive Avatar , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[11] Lei Xie,et al. Realistic Mouth-Synching for Speech-Driven Talking Face Using Articulatory Modelling , 2007, IEEE Transactions on Multimedia.

[12] Harry Shum,et al. Real-Time Bayesian 3-D Pose Tracking , 2006, IEEE Transactions on Circuits and Systems for Video Technology.

[13] Lei Xie,et al. Speech Animation Using Coupled Hidden Markov Models , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[14] Jörn Ostermann,et al. Parameterization of Mouth Images by LLE and PCA for Image-Based Facial Animation , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[15] Ren-Hua Wang,et al. Minimum Generation Error Training for HMM-Based Speech Synthesis , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[16] Scott A. King,et al. Creating speech-synchronized animation , 2005, IEEE Transactions on Visualization and Computer Graphics.

[17] Keiichi Tokuda,et al. Spectral conversion based on maximum likelihood estimation considering global variance of converted parameter , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[18] Gavin C. Cawley,et al. Near-videorealistic synthetic talking faces: implementation and evaluation , 2004, Speech Commun..

[19] Tien-Tsin Wong,et al. A real-time Cantonese text-to-audiovisual speech synthesizer , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[20] P. Pérez,et al. Poisson image editing , 2003, ACM Trans. Graph..

[21] Satoshi Nakamura,et al. Statistical multimodal integration for audio-visual speech processing , 2002, IEEE Trans. Neural Networks.

[22] Hans Peter Graf,et al. Triphone based unit selection for concatenative visual speech synthesis , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[23] Keiichi Tokuda,et al. HMM-based text-to-audio-visual speech synthesis , 2000, INTERSPEECH.

[24] Hans Peter Graf,et al. Photo-Realistic Talking-Heads from Image Samples , 2000, IEEE Trans. Multim..

[25] Robert E. Donovan,et al. The IBM trainable speech synthesis system , 1998, ICSLP.

[26] David Salesin,et al. Synthesizing realistic facial expressions from photographs , 1998, SIGGRAPH.

[27] Tony Ezzat,et al. MikeTalk: a talking facial display based on morphing visemes , 1998, Proceedings Computer Animation '98 (Cat. No.98EX169).

[28] Christoph Bregler,et al. Video Rewrite: Driving Visual Speech with Audio , 1997, SIGGRAPH.

[29] Alex Acero,et al. Recent improvements on Microsoft's trainable text-to-speech system-Whistler , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[30] Alan W. Black,et al. Unit selection in a concatenative speech synthesis system using a large speech database , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[31] Keiichi Tokuda,et al. Speech synthesis using HMMs with dynamic features , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[32] J. P. Lewis. Fast Normalized Cross-Correlation , 2010 .

[33] Keiichi Tokuda,et al. An improved minimum generation error based model adaptation for HMM-based speech synthesis , 2009, INTERSPEECH.

[34] Jörn Ostermann,et al. Realistic facial animation system for interactive services , 2008, INTERSPEECH.

[35] Tsuhan Chen,et al. Audiovisual speech processing , 2001, IEEE Signal Process. Mag..

[36] Matthew Turk,et al. A Morphable Model For The Synthesis Of 3D Faces , 1999, SIGGRAPH.

[37] Tony Ezzat,et al. Mary101:a trainable videorealistic speech animation , 2022 .