暂无分享,去创建一个
Yannis Stylianou | Ahmed Hussen Abdelaziz | Sachin Kajarekar | Anushree Prasanna Kumar | Gabriele Fanelli | Justin Binder | Chloe Seivwright | Anushree Prasanna Kumar | G. Fanelli | Y. Stylianou | S. Kajarekar | A. H. Abdelaziz | Chloe Seivwright | Justin Binder
[1] J. C. Cotton. NORMAL "VISUAL HEARING". , 1935, Science.
[2] H. McGurk,et al. Hearing lips and seeing voices , 1976, Nature.
[3] P. Ekman,et al. Facial action coding system: a technique for the measurement of facial movement , 1978 .
[4] Matthew Brand,et al. Voice puppetry , 1999, SIGGRAPH.
[5] Keiichi Tokuda,et al. HMM-based text-to-audio-visual speech synthesis , 2000, INTERSPEECH.
[6] Ashish Kapoor,et al. Text-to-Audiovisual Speech Synthesizer , 2000, Virtual Worlds.
[7] Timothy F. Cootes,et al. Active Appearance Models , 2001, IEEE Trans. Pattern Anal. Mach. Intell..
[8] Li Zhang,et al. Spacetime faces: high resolution capture for modeling and animation , 2004, SIGGRAPH 2004.
[9] Ricardo Gutierrez-Osuna,et al. Audio/visual mapping with cross-modal hidden Markov models , 2005, IEEE Transactions on Multimedia.
[10] John Sabini,et al. Ekman's basic emotions: Why not love and jealousy? , 2005 .
[11] Lei Xie,et al. Realistic Mouth-Synching for Speech-Driven Talking Face Using Articulatory Modelling , 2007, IEEE Transactions on Multimedia.
[12] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .
[13] Frank K. Soong,et al. A real-time text to audio-visual speech synthesis system , 2008, INTERSPEECH.
[14] Mark Pauly,et al. Example-based facial rigging , 2010, SIGGRAPH 2010.
[15] Paul A. Beardsley,et al. High-quality passive facial performance capture using anchor frames , 2011, SIGGRAPH 2011.
[16] Mark Pauly,et al. Realtime performance-based facial animation , 2011, ACM Trans. Graph..
[17] Moshe Mahler,et al. Dynamic units of visual speech , 2012, SCA '12.
[18] Paul Graham,et al. Driving high-resolution facial blendshapes with video performance capture , 2013, SIGGRAPH '13.
[19] Björn Stenger,et al. An expressive text-driven 3D talking head , 2013, SIGGRAPH '13.
[20] Kun Zhou,et al. Real-time facial animation on mobile devices , 2014, Graph. Model..
[21] Joo-Ho Lee,et al. Talking heads synthesis from audio with deep neural networks , 2015, 2015 IEEE/SICE International Symposium on System Integration (SII).
[22] Lei Xie,et al. Photo-real talking head with deep bidirectional LSTM , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[23] Yoshua Bengio,et al. Attention-Based Models for Speech Recognition , 2015, NIPS.
[24] Thabo Beeler,et al. Real-time high-fidelity facial performance capture , 2015, ACM Trans. Graph..
[25] Yisong Yue,et al. A Decision Tree Framework for Spatiotemporal Sequence Prediction , 2015, KDD.
[26] Ben P. Milner,et al. Audio-to-Visual Speech Conversion Using Deep Neural Networks , 2016, INTERSPEECH.
[27] Kun Zhou,et al. Real-time facial animation with image-based dynamic avatars , 2016, ACM Trans. Graph..
[28] Jaakko Lehtinen,et al. Audio-driven facial animation by joint end-to-end learning of pose and emotion , 2017, ACM Trans. Graph..
[29] Samy Bengio,et al. Tacotron: Towards End-to-End Speech Synthesis , 2017, INTERSPEECH.
[30] Ranniery Maia,et al. Expressive visual text to speech and expression adaptation using deep neural networks , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[31] Shiguang Shan,et al. A Fully End-to-End Cascaded CNN for Facial Landmark Detection , 2017, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).
[32] Yisong Yue,et al. A deep learning approach for generalized speech animation , 2017, ACM Trans. Graph..
[33] Navdeep Jaitly,et al. Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[34] Yuxuan Wang,et al. Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis , 2018, ICML.
[35] Yuxuan Wang,et al. Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron , 2018, ICML.
[36] Subhransu Maji,et al. Visemenet , 2018, ACM Trans. Graph..
[37] Visemenet , 2018, ACM Transactions on Graphics.
[38] Yoshua Bengio,et al. ObamaNet: Photo-realistic lip-sync from text , 2017, ArXiv.
[39] Erich Elsen,et al. Efficient Neural Audio Synthesis , 2018, ICML.
[40] Barry-John Theobald,et al. Speaker-Independent Speech-Driven Visual Speech Synthesis using Domain-Adapted Acoustic Models , 2019, ICMI.
[41] Philip N. Garner,et al. Self-Attention for Speech Emotion Recognition , 2019, INTERSPEECH.
[42] Erik Marchi,et al. Self-supervised Learning of Visual Speech Features with Audiovisual Speech Enhancement , 2020, ArXiv.
[43] Paul Dixon,et al. Modality Dropout for Improved Performance-driven Talking Faces , 2020, ICMI.
[44] Justus Thies,et al. Neural Voice Puppetry: Audio-driven Facial Reenactment , 2019, ECCV.
[45] Chen Change Loy,et al. Everybody’s Talkin’: Let Me Talk as You Want , 2020, IEEE Transactions on Information Forensics and Security.