Speaker-Independent Speech-Driven Visual Speech Synthesis using Domain-Adapted Acoustic Models
暂无分享,去创建一个
Barry-John Theobald | Ahmed Hussen Abdelaziz | Thibaut Weise | Gabriele Fanelli | Justin Binder | Nicholas Apostoloff | Paul Dixon | Sachin Kajareker | G. Fanelli | T. Weise | B. Theobald | N. Apostoloff | A. H. Abdelaziz | Justin Binder | Paul Dixon | Sachin Kajareker
[1] Hai Xuan Pham,et al. End-to-end Learning for 3D Facial Animation from Raw Waveforms of Speech , 2017, ArXiv.
[2] Hani Yehia,et al. Quantitative association of vocal-tract and facial behavior , 1998, Speech Commun..
[3] Maja Pantic,et al. End-to-End Speech-Driven Facial Animation with Temporal GANs , 2018, BMVC.
[4] S. J. Young,et al. Tree-based state tying for high accuracy acoustic modelling , 1994 .
[5] Hao Li,et al. Realtime performance-based facial animation , 2011, ACM Trans. Graph..
[6] M. Pauly,et al. Example-based facial rigging , 2010, ACM Trans. Graph..
[7] P. Ekman,et al. Facial action coding system: a technique for the measurement of facial movement , 1978 .
[8] Levent M. Arslan,et al. 3-D Face Point Trajectory Synthesis Using An Automatically Derived Visual Phoneme Similarity Matrix , 1998, AVSP.
[9] Mark J. F. Gales,et al. Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..
[10] Lei Xie,et al. A coupled HMM approach to video-realistic speech animation , 2007, Pattern Recognit..
[11] P. Ekman,et al. Facial action coding system , 2019 .
[12] Ben P. Milner,et al. Audio-to-Visual Speech Conversion Using Deep Neural Networks , 2016, INTERSPEECH.
[13] Joo-Ho Lee,et al. Talking heads synthesis from audio with deep neural networks , 2015, 2015 IEEE/SICE International Symposium on System Integration (SII).
[14] Ira Kemelmacher-Shlizerman,et al. Synthesizing Obama , 2017, ACM Trans. Graph..
[15] Kevin Barraclough,et al. I and i , 2001, BMJ : British Medical Journal.
[16] Ricardo Gutierrez-Osuna,et al. Audio/visual mapping with cross-modal hidden Markov models , 2005, IEEE Transactions on Multimedia.
[17] Sepp Hochreiter,et al. Self-Normalizing Neural Networks , 2017, NIPS.
[18] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .
[19] Matthew Brand,et al. Voice puppetry , 1999, SIGGRAPH.
[20] Lei Xie,et al. Realistic Mouth-Synching for Speech-Driven Talking Face Using Articulatory Modelling , 2007, IEEE Transactions on Multimedia.
[21] Hamid Aghajan,et al. Speech-Driven Facial Reenactment Using Conditional Generative Adversarial Networks , 2018, ArXiv.
[22] Ramón Fernández Astudillo,et al. Noise-Adaptive LDA: A New Approach for Speech Recognition Under Observation Uncertainty , 2013, IEEE Signal Processing Letters.
[23] Moshe Mahler,et al. Dynamic units of visual speech , 2012, SCA '12.
[24] Hans Peter Graf,et al. Photo-Realistic Talking-Heads from Image Samples , 2000, IEEE Trans. Multim..
[25] Jaakko Lehtinen,et al. Audio-driven facial animation by joint end-to-end learning of pose and emotion , 2017, ACM Trans. Graph..
[26] Jingwen Zhu,et al. Talking Face Generation by Conditional Recurrent Adversarial Network , 2018, IJCAI.
[27] Lukás Burget,et al. Sequence-discriminative training of deep neural networks , 2013, INTERSPEECH.
[28] Wesley Mattheyses,et al. Comprehensive many-to-many phoneme-to-viseme mapping and its application for concatenative visual speech synthesis , 2013, Speech Commun..
[29] Christoph Bregler,et al. Video Rewrite: Driving Visual Speech with Audio , 1997, SIGGRAPH.
[30] Tony Ezzat,et al. Trainable videorealistic speech animation , 2002, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..
[31] Shiguang Shan,et al. A Fully End-to-End Cascaded CNN for Facial Landmark Detection , 2017, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).
[32] Yisong Yue,et al. A Decision Tree Framework for Spatiotemporal Sequence Prediction , 2015, KDD.
[33] Lei Xie,et al. Photo-real talking head with deep bidirectional LSTM , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[34] H. B. Mann,et al. On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other , 1947 .
[35] Jenq-Neng Hwang,et al. Hidden Markov Model Inversion for Audio-to-Visual Conversion in an MPEG-4 Facial Animation System , 2001, J. VLSI Signal Process..
[36] Ricardo Gutierrez-Osuna,et al. Speech-driven facial animation with realistic dynamics , 2005, IEEE Transactions on Multimedia.
[37] Gérard Bailly,et al. A new trainable trajectory formation system for facial animation , 2006, ExLing.
[38] Björn Stenger,et al. Expressive Visual Text-to-Speech Using Active Appearance Models , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.
[39] Frank K. Soong,et al. HMM trajectory-guided sample selection for photo-realistic talking head , 2014, Multimedia Tools and Applications.
[40] Yisong Yue,et al. A deep learning approach for generalized speech animation , 2017, ACM Trans. Graph..
[41] Dorothea Kolossa,et al. Learning Dynamic Stream Weights For Coupled-HMM-Based Audio-Visual Speech Recognition , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.