Speech-driven facial animation with spectral gathering and temporal attention
暂无分享,去创建一个
[1] Navdeep Jaitly,et al. Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[2] Hai Xuan Pham,et al. End-to-end Learning for 3D Facial Animation from Speech , 2018, ICMI.
[3] Ben P. Milner,et al. The Effect of Real-Time Constraints on Automatic Speech Animation , 2018, INTERSPEECH.
[4] James H. Martin,et al. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .
[5] Joo-Ho Lee,et al. Talking heads synthesis from audio with deep neural networks , 2015, 2015 IEEE/SICE International Symposium on System Integration (SII).
[6] Ira Kemelmacher-Shlizerman,et al. Synthesizing Obama , 2017, ACM Trans. Graph..
[7] Frank K. Soong,et al. Text Driven 3D Photo-Realistic Talking Head , 2011, INTERSPEECH.
[8] Frank K. Soong,et al. A new language independent, photo-realistic talking head driven by voice only , 2013, INTERSPEECH.
[9] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[10] Chenliang Xu,et al. Generating Talking Face Landmarks from Speech , 2018, LVA/ICA.
[11] Michael M. Cohen,et al. Modeling Coarticulation in Synthetic Visual Speech , 1993 .
[12] Francis Rousseaux,et al. Text-driven Mouth Animation for Human Computer Interaction With Personal Assistant , 2019, Proceedings of the 25th International Conference on Auditory Display (ICAD 2019).
[13] Chenliang Xu,et al. Hierarchical Cross-Modal Talking Face Generation With Dynamic Pixel-Wise Loss , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[14] Erich Elsen,et al. Deep Speech: Scaling up end-to-end speech recognition , 2014, ArXiv.
[15] Tae-Hyun Oh,et al. Speech2Face: Learning the Face Behind a Voice , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[16] Maja Pantic,et al. Realistic Speech-Driven Facial Animation with GANs , 2019, International Journal of Computer Vision.
[17] Heloir,et al. The Uncanny Valley , 2019, The Animation Studies Reader.
[18] Lin Gao,et al. Sparse Data Driven Mesh Deformation , 2017, IEEE Transactions on Visualization and Computer Graphics.
[19] Yisong Yue,et al. A deep learning approach for generalized speech animation , 2017, ACM Trans. Graph..
[20] Frédéric Jurie,et al. An Occam's Razor View on Learning Audiovisual Emotion Recognition with Small Training Sets , 2018, ICMI.
[21] Stephen D. Laycock,et al. Joint Learning of Facial Expression and Head Pose from Speech , 2018, INTERSPEECH.
[22] Hai Xuan Pham,et al. Speech-Driven 3D Facial Animation with Implicit Emotional Awareness: A Deep Learning Approach , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[23] Yuyu Xu,et al. A Practical and Configurable Lip Sync Method for Games , 2013, MIG.
[24] Moshe Mahler,et al. Dynamic units of visual speech , 2012, SCA '12.
[25] Stefanos Zafeiriou,et al. Synthesising 3D Facial Motion from “In-the-Wild” Speech , 2019, 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020).
[26] Yaser Sheikh,et al. Real-time 3D neural facial animation from binocular video , 2021, ACM Trans. Graph..
[27] Lei Xie,et al. Realistic Mouth-Synching for Speech-Driven Talking Face Using Articulatory Modelling , 2007, IEEE Transactions on Multimedia.
[28] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[29] Misha Denil,et al. Learning Where to Attend with Deep Architectures for Image Tracking , 2011, Neural Computation.
[30] Christoph Bregler,et al. Video Rewrite: Driving Visual Speech with Audio , 1997, SIGGRAPH.
[31] Rui Wang,et al. Learning Discriminative Joint Embeddings for Efficient Face and Voice Association , 2020, SIGIR.
[32] Jovan Popovic,et al. Deformation transfer for triangle meshes , 2004, ACM Trans. Graph..
[33] Tae-Hyun Oh,et al. On Learning Associations of Faces and Voices , 2018, ACCV.
[34] Gerald Penn,et al. Convolutional Neural Networks for Speech Recognition , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[35] Wilmot Li,et al. Real-Time Lip Sync for Live 2D Animation , 2019, ArXiv.
[36] DeLiang Wang,et al. Time and frequency domain long short-term memory for noise robust pitch tracking , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[37] Mark Steedman,et al. Generating Facial Expressions for Speech , 1996, Cogn. Sci..
[38] Rui Wang,et al. Deep Audio-visual Learning: A Survey , 2020, International Journal of Automation and Computing.
[39] Maie Bachmann,et al. Audiovisual emotion recognition in wild , 2018, Machine Vision and Applications.
[40] Eugene Fiume,et al. JALI , 2016, ACM Trans. Graph..
[41] Matthew Brand,et al. Voice puppetry , 1999, SIGGRAPH.
[42] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .
[43] Jianfei Cai,et al. Alive Caricature from 2D to 3D , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[44] Jovan Popović,et al. Deformation transfer for triangle meshes , 2004, SIGGRAPH 2004.
[45] Ryan Prenger,et al. Waveglow: A Flow-based Generative Network for Speech Synthesis , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[46] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[47] Michael J. Black,et al. Capture, Learning, and Synthesis of 3D Speaking Styles , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[48] Jaakko Lehtinen,et al. Audio-driven facial animation by joint end-to-end learning of pose and emotion , 2017, ACM Trans. Graph..
[49] Alice Wang,et al. Assembling an expressive facial animation system , 2007, Sandbox '07.
[50] Barry-John Theobald,et al. Speaker-Independent Speech-Driven Visual Speech Synthesis using Domain-Adapted Acoustic Models , 2019, ICMI.
[51] Jean-Luc Schwartz,et al. No, There Is No 150 ms Lead of Visual Speech on Auditory Speech, but a Range of Audiovisual Asynchronies Varying from Small Audio Lead to Large Audio Lag , 2014, PLoS Comput. Biol..
[52] Frank K. Soong,et al. A deep bidirectional LSTM approach for video-realistic talking head , 2016, Multimedia Tools and Applications.
[53] Raymond D. Kent,et al. Coarticulation in recent speech production models , 1977 .
[54] Verónica Orvalho,et al. A Facial Rigging Survey , 2012, Eurographics.
[55] Yong Liu,et al. Audio2Face: Generating Speech/Face Animation from Single Audio with Attention-Based Bidirectional LSTM Networks , 2019, 2019 IEEE International Conference on Multimedia & Expo Workshops (ICMEW).
[56] Karl F. MacDorman,et al. The Uncanny Valley [From the Field] , 2012, IEEE Robotics Autom. Mag..
[57] Kensuke Harada,et al. Speech-Driven Facial Animation by LSTM-RNN for Communication Use , 2019, 2019 12th Asia Pacific Workshop on Mixed and Augmented Reality (APMAR).
[58] Kun Zhou,et al. Displaced dynamic expression regression for real-time facial tracking and animation , 2014, ACM Trans. Graph..
[59] Tony Ezzat,et al. Trainable videorealistic speech animation , 2002, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..
[60] Jitendra Malik,et al. Learning Individual Styles of Conversational Gesture , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[61] Hao Li,et al. Pinscreen avatars in your pocket: mobile paGAN engine and personalized gaming , 2018, SIGGRAPH 2018.
[62] Tara N. Sainath,et al. Modeling Time-Frequency Patterns with LSTM vs. Convolutional Architectures for LVCSR Tasks , 2016, INTERSPEECH.