Noise-Resilient Training Method for Face Landmark Generation From Speech
暂无分享,去创建一个
Chenliang Xu | Ross K. Maddox | Sefik Emre Eskimez | Zhiyao Duan | S. Eskimez | Chenliang Xu | R. Maddox | Z. Duan
[1] Jenq-Neng Hwang,et al. Hidden Markov Model Inversion for Audio-to-Visual Conversion in an MPEG-4 Facial Animation System , 2001, J. VLSI Signal Process..
[2] Lei Xie,et al. Photo-real talking head with deep bidirectional LSTM , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[3] Adrian K. C. Lee,et al. Auditory selective attention is enhanced by a task-irrelevant temporally coherent visual stimulus in human listeners , 2015, eLife.
[4] Vaibhava Goel,et al. Deep multimodal learning for Audio-Visual Speech Recognition , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[5] Matthew Brand,et al. Voice puppetry , 1999, SIGGRAPH.
[6] Frank K. Soong,et al. Text Driven 3D Photo-Realistic Talking Head , 2011, INTERSPEECH.
[7] J. Gower. Generalized procrustes analysis , 1975 .
[8] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[9] Israel Cohen,et al. Audio-Visual Voice Activity Detection Using Diffusion Maps , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[10] Paul L. Rosin,et al. VIDEO REALISTIC TALKING HEADS USING HIERARCHICAL NON-LINEAR SPEECH-APPEARANCE MODELS , 2003 .
[11] Timothy F. Cootes,et al. Active Appearance Models , 2001, IEEE Trans. Pattern Anal. Mach. Intell..
[12] Yuxiao Hu,et al. Real-time conversion from a single 2D face image to a 3D text-driven emotive audio-visual avatar , 2008, 2008 IEEE International Conference on Multimedia and Expo.
[13] Lei Xie,et al. A statistical parametric approach to video-realistic text-driven talking avatar , 2013, Multimedia Tools and Applications.
[14] Chenliang Xu,et al. Generating Talking Face Landmarks from Speech , 2018, LVA/ICA.
[15] Heiga Zen,et al. Deep Learning for Acoustic Modeling in Parametric Speech Generation: A systematic review of existing techniques and future trends , 2015, IEEE Signal Processing Magazine.
[16] Houqiang Li,et al. Sign Language Recognition using 3D convolutional neural networks , 2015, 2015 IEEE International Conference on Multimedia and Expo (ICME).
[17] Frank K. Soong,et al. A new language independent, photo-realistic talking head driven by voice only , 2013, INTERSPEECH.
[18] Md. Zakir Hossain,et al. A Comprehensive Survey of Deep Learning for Image Captioning , 2018, ACM Comput. Surv..
[19] Hai Xuan Pham,et al. Speech-Driven 3D Facial Animation with Implicit Emotional Awareness: A Deep Learning Approach , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[20] Björn Stenger,et al. Expressive visual text-to-speech as an assistive technology for individuals with autism spectrum conditions , 2016, Comput. Vis. Image Underst..
[21] Hai Xuan Pham,et al. End-to-end Learning for 3D Facial Animation from Raw Waveforms of Speech , 2017, ArXiv.
[22] Joshua G. W. Bernstein,et al. Auditory and auditory-visual intelligibility of speech in fluctuating maskers for normal-hearing and hearing-impaired listeners. , 2009, The Journal of the Acoustical Society of America.
[23] Paul L. Rosin,et al. Speech driven facial animation using a hidden Markov coarticulation model , 2004, ICPR 2004.
[24] Maja Pantic,et al. End-to-end visual speech recognition with LSTMS , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[25] Jon Barker,et al. An audio-visual corpus for speech perception and automatic speech recognition. , 2006, The Journal of the Acoustical Society of America.
[26] Lei Xie,et al. A coupled HMM approach to video-realistic speech animation , 2007, Pattern Recognit..
[27] Ira Kemelmacher-Shlizerman,et al. Synthesizing Obama , 2017, ACM Trans. Graph..
[28] Kevin Wilson,et al. Looking to listen at the cocktail party , 2018, ACM Trans. Graph..
[29] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .
[30] Davis E. King,et al. Dlib-ml: A Machine Learning Toolkit , 2009, J. Mach. Learn. Res..
[31] Mark J. F. Gales,et al. Photo-realistic expressive text to talking head synthesis , 2013, INTERSPEECH.
[32] Simon Baker,et al. Active Appearance Models Revisited , 2004, International Journal of Computer Vision.
[33] Joon Son Chung,et al. The Conversation: Deep Audio-Visual Speech Enhancement , 2018, INTERSPEECH.
[34] Georgios Tzimiropoulos,et al. How Far are We from Solving the 2D & 3D Face Alignment Problem? (and a Dataset of 230,000 3D Facial Landmarks) , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[35] Takeo Kanade,et al. Multi-PIE , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.
[36] Joon Son Chung,et al. You said that? , 2017, BMVC.
[37] Chenliang Xu,et al. Lip Movements Generation at a Glance , 2018, ECCV.
[38] Jaakko Lehtinen,et al. Audio-driven facial animation by joint end-to-end learning of pose and emotion , 2017, ACM Trans. Graph..
[39] Lucas D. Terissi,et al. Audio-to-Visual Conversion Via HMM Inversion for Speech-Driven Facial Animation , 2008, SBIA.
[40] M. Shamim Hossain,et al. Audio-Visual Emotion Recognition Using Big Data Towards 5G , 2016, Mob. Networks Appl..
[41] Timothy F. Cootes,et al. Active Shape Models-Their Training and Application , 1995, Comput. Vis. Image Underst..
[42] R. Freyman,et al. The role of visual speech cues in reducing energetic and informational masking. , 2005, The Journal of the Acoustical Society of America.