A minimum converted trajectory error (MCTE) approach to high quality speech-to-lips conversion
暂无分享,去创建一个
Frank K. Soong | Mark Hasegawa-Johnson | Xiaodan Zhuang | Lijuan Wang | M. Hasegawa-Johnson | Lijuan Wang | F. Soong | Xiaodan Zhuang
[1] Keiichi Tokuda,et al. HMM-based text-to-audio-visual speech synthesis , 2000, INTERSPEECH.
[2] Tsuhan Chen,et al. Audiovisual speech processing , 2001, IEEE Signal Process. Mag..
[3] Eric Moulines,et al. Continuous probabilistic transform for voice conversion , 1998, IEEE Trans. Speech Audio Process..
[4] R.M. Stern,et al. Missing-feature approaches in speech recognition , 2005, IEEE Signal Processing Magazine.
[5] Lucas D. Terissi,et al. Audio-to-Visual Conversion Via HMM Inversion for Speech-Driven Facial Animation , 2008, SBIA.
[6] Ren-Hua Wang,et al. Minimum Generation Error Training for HMM-Based Speech Synthesis , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.
[7] Gérard Bailly,et al. LIPS2008: visual speech synthesis challenge , 2008, INTERSPEECH.
[8] Lei Xie,et al. A coupled HMM approach to video-realistic speech animation , 2007, Pattern Recognit..
[9] Ricardo Gutierrez-Osuna,et al. Audio/visual mapping with cross-modal hidden Markov models , 2005, IEEE Transactions on Multimedia.
[10] Tomoki Toda,et al. Voice Conversion Based on Maximum-Likelihood Estimation of Spectral Parameter Trajectory , 2007, IEEE Transactions on Audio, Speech, and Language Processing.
[11] Frank K. Soong,et al. Synthesizing photo-real talking head via trajectory-guided sample selection , 2010, INTERSPEECH.
[12] György Takács. Direct, modular and hybrid audio to visual speech conversion methods - a comparative study , 2009, INTERSPEECH.
[13] Thomas S. Huang,et al. Real-time speech-driven face animation with expressions using neural networks , 2002, IEEE Trans. Neural Networks.