Photo-real lips synthesis with trajectory-guided sample selection
暂无分享,去创建一个
Frank K. Soong | Lijuan Wang | Wei Han | Xiaojun Qian | Lijuan Wang | F. Soong | Xiaojun Qian | Wei Han
[1] Alan W. Black,et al. Unit selection in a concatenative speech synthesis system using a large speech database , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.
[2] Keiichi Tokuda,et al. Speech synthesis using HMMs with dynamic features , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.
[3] Christoph Bregler,et al. Video Rewrite: Driving Visual Speech with Audio , 1997, SIGGRAPH.
[4] Alex Acero,et al. Recent improvements on Microsoft's trainable text-to-speech system-Whistler , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[5] Hans Peter Graf,et al. Sample-based synthesis of photo-realistic talking heads , 1998, Proceedings Computer Animation '98 (Cat. No.98EX169).
[6] Robert E. Donovan,et al. The IBM trainable speech synthesis system , 1998, ICSLP.
[7] Tony Ezzat,et al. MikeTalk: a talking facial display based on morphing visemes , 1998, Proceedings Computer Animation '98 (Cat. No.98EX169).
[8] Keiichi Tokuda,et al. Hidden Markov models based on multi-space probability distribution for pitch pattern modeling , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).
[9] Keiichi Tokuda,et al. HMM-based text-to-audio-visual speech synthesis , 2000, INTERSPEECH.
[10] Hans Peter Graf,et al. Photo-Realistic Talking-Heads from Image Samples , 2000, IEEE Trans. Multim..
[11] Tsuhan Chen,et al. Audiovisual speech processing , 2001, IEEE Signal Process. Mag..
[12] Satoshi Nakamura,et al. Statistical multimodal integration for audio-visual speech processing , 2002, IEEE Trans. Neural Networks.
[13] Hans Peter Graf,et al. Triphone based unit selection for concatenative visual speech synthesis , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[14] Patrick Pérez,et al. Poisson image editing , 2003, ACM Trans. Graph..
[15] Toshio Hirai,et al. Using 5 ms segments in concatenative speech synthesis , 2004, SSW.
[16] Gavin C. Cawley,et al. Near-videorealistic synthetic talking faces: implementation and evaluation , 2004, Speech Commun..
[17] Tomaso Poggio,et al. Trainable Videorealistic Speech Animation , 2004, FGR.
[18] Tsuhan Chen,et al. Integration strategies for audio-visual speech processing: applied to text-dependent speaker recognition , 2005, IEEE Transactions on Multimedia.
[19] Keiichi Tokuda,et al. Spectral conversion based on maximum likelihood estimation considering global variance of converted parameter , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..
[20] Jörn Ostermann,et al. Parameterization of Mouth Images by LLE and PCA for Image-Based Facial Animation , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.
[21] Zhenhua Ling. HMM-based Unit Selection Using F , 2006 .
[22] Lei Xie,et al. Speech Animation Using Coupled Hidden Markov Models , 2006, 18th International Conference on Pattern Recognition (ICPR'06).
[23] Harry Shum,et al. Real-Time Bayesian 3-D Pose Tracking , 2006, IEEE Transactions on Circuits and Systems for Video Technology.
[24] Heiga Zen,et al. The HMM-based speech synthesis system (HTS) version 2.0 , 2007, SSW.
[25] Jörn Ostermann,et al. Realistic facial animation system for interactive services , 2008, INTERSPEECH.
[26] Gérard Bailly,et al. LIPS2008: visual speech synthesis challenge , 2008, INTERSPEECH.
[27] Hichem Sahli,et al. Multimodal Unit Selection for 2D Audiovisual Text-to-Speech Synthesis , 2008, MLMI.
[28] Zhi-Jie Yan,et al. RIch-context Unit Selection (RUS) approach to high quality TTS , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.
[29] J. P. Lewis. Fast Normalized Cross-Correlation , 2010 .