End-to-end Learning for 3D Facial Animation from Speech
暂无分享,去创建一个
[1] Jörn Ostermann,et al. Lifelike talking faces for interactive services , 2003, Proc. IEEE.
[2] Hai Xuan Pham,et al. Speech-Driven 3D Facial Animation with Implicit Emotional Awareness: A Deep Learning Approach , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[3] Ira Kemelmacher-Shlizerman,et al. Synthesizing Obama , 2017, ACM Trans. Graph..
[4] Matthew Turk,et al. A Morphable Model For The Synthesis Of 3D Faces , 1999, SIGGRAPH.
[5] Tara N. Sainath,et al. Deep Convolutional Neural Networks for Large-scale Speech Tasks , 2015, Neural Networks.
[6] Yisong Yue,et al. A deep learning approach for generalized speech animation , 2017, ACM Trans. Graph..
[7] Frank K. Soong,et al. A deep bidirectional LSTM approach for video-realistic talking head , 2016, Multimedia Tools and Applications.
[8] George Trigeorgis,et al. Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[9] Jaakko Lehtinen,et al. Audio-driven facial animation by joint end-to-end learning of pose and emotion , 2017, ACM Trans. Graph..
[10] Yoshua Bengio,et al. Convolutional networks for images, speech, and time series , 1998 .
[11] Lei Xie,et al. Realistic Mouth-Synching for Speech-Driven Talking Face Using Articulatory Modelling , 2007, IEEE Transactions on Multimedia.
[12] Tara N. Sainath,et al. Learning the speech front-end with raw waveform CLDNNs , 2015, INTERSPEECH.
[13] Frank K. Soong,et al. A new language independent, photo-realistic talking head driven by voice only , 2013, INTERSPEECH.
[14] Lianhong Cai,et al. Real-time synthesis of Chinese visual speech and facial expressions using MPEG-4 FAP features in a three-dimensional avatar , 2006, INTERSPEECH.
[15] Vladimir Pavlovic,et al. Robust Real-Time 3 D Face Tracking from RGBD Videos under Extreme Pose , Depth , and Expression Variations , 2017 .
[16] Brian C. Lovell,et al. Multi-Region Probabilistic Histograms for Robust and Scalable Identity Inference , 2009, ICB.
[17] Frédéric H. Pighin,et al. Expressive speech-driven facial animation , 2005, TOGS.
[18] Frank K. Soong,et al. Synthesizing photo-real talking head via trajectory-guided sample selection , 2010, INTERSPEECH.
[19] Tara N. Sainath,et al. Convolutional, Long Short-Term Memory, fully connected Deep Neural Networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[20] Dimitri Palaz,et al. Estimating phoneme class conditional probabilities from raw speech signal using convolutional neural networks , 2013, INTERSPEECH.
[21] Lei Xie,et al. Head motion synthesis from speech using deep neural networks , 2015, Multimedia Tools and Applications.
[22] Yiying Tong,et al. FaceWarehouse: A 3D Facial Expression Database for Visual Computing , 2014, IEEE Transactions on Visualization and Computer Graphics.
[23] Björn Granström,et al. SynFace—Speech-Driven Facial Animation for Virtual Speech-Reading Support , 2009, EURASIP J. Audio Speech Music. Process..
[24] Frank K. Soong,et al. On the training aspects of Deep Neural Network (DNN) for parametric TTS synthesis , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[25] Heiga Zen,et al. Statistical parametric speech synthesis using deep neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[26] Yoshua Bengio,et al. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.
[27] Alice Wang,et al. Assembling an expressive facial animation system , 2007, Sandbox '07.
[28] Gerald Penn,et al. Convolutional Neural Networks for Speech Recognition , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[29] Keiichi Tokuda,et al. HMM-based text-to-audio-visual speech synthesis , 2000, INTERSPEECH.
[30] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[31] Ron J. Weiss,et al. Speech acoustic modeling from raw multichannel waveforms , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[32] Gerald Penn,et al. Applying Convolutional Neural Networks concepts to hybrid NN-HMM model for speech recognition , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[33] Li Deng,et al. A deep convolutional neural network using heterogeneous pooling for trading acoustic invariance with phonetic confusion , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[34] Hai Xuan Pham,et al. Robust Real-Time 3D Face Tracking from RGBD Videos under Extreme Pose, Depth, and Expression Variation , 2016, 2016 Fourth International Conference on 3D Vision (3DV).
[35] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[36] Tomaso A. Poggio,et al. Reanimating Faces in Images and Video , 2003, Comput. Graph. Forum.
[37] Hai Xuan Pham,et al. Robust real-time performance-driven 3D face tracking , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).
[38] Tony Ezzat,et al. Trainable videorealistic speech animation , 2002, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..
[39] James D. Edge,et al. Audio-visual feature selection and reduction for emotion classification , 2008, AVSP.
[40] Frank K. Soong,et al. Text Driven 3D Photo-Realistic Talking Head , 2011, INTERSPEECH.
[41] Christoph Bregler,et al. Video Rewrite: Driving Visual Speech with Audio , 1997, SIGGRAPH.