A deep learning approach for generalized speech animation
暂无分享,去创建一个
Yisong Yue | Jessica K. Hodgins | Moshe Mahler | Taehwan Kim | Iain A. Matthews | Sarah L. Taylor | James Krahe | Anastasio Garcia Rodriguez | Yisong Yue | J. Hodgins | I. Matthews | Moshe Mahler | Taehwan Kim | Sarah L. Taylor | James Krahe
[1] Ricardo Gutierrez-Osuna,et al. Audio/visual mapping with cross-modal hidden Markov models , 2005, IEEE Transactions on Multimedia.
[2] Frank K. Soong,et al. High quality lip-sync animation for 3D photo-realistic talking head , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[3] Simon Baker,et al. Active Appearance Models Revisited , 2004, International Journal of Computer Vision.
[4] Jörn Ostermann,et al. Evaluation of an image-based talking head with realistic facial expression and head motion , 2011, Journal on Multimodal User Interfaces.
[5] Xin Tong,et al. Leveraging motion capture and 3D scanning for high-fidelity facial performance acquisition , 2011, ACM Trans. Graph..
[6] José Mario De Martino,et al. Facial animation based on context-dependent visemes , 2006, Comput. Graph..
[7] Lei Xie,et al. Photo-real talking head with deep bidirectional LSTM , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[8] Jovan Popovic,et al. Deformation transfer for triangle meshes , 2004, ACM Trans. Graph..
[9] Navdeep Jaitly,et al. Towards End-To-End Speech Recognition with Recurrent Neural Networks , 2014, ICML.
[10] Wesley Mattheyses,et al. Comprehensive many-to-many phoneme-to-viseme mapping and its application for concatenative visual speech synthesis , 2013, Speech Commun..
[11] Kun Zhou,et al. 3D shape regression for real-time facial animation , 2013, ACM Trans. Graph..
[12] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[13] Björn Stenger,et al. Expressive Visual Text-to-Speech Using Active Appearance Models , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.
[14] Jihun Yu,et al. Realtime facial animation with on-the-fly correctives , 2013, ACM Trans. Graph..
[15] Michael M. Cohen,et al. Modeling Coarticulation in Synthetic Visual Speech , 1993 .
[16] Johannes Fürnkranz,et al. Decision Tree , 2010, Encyclopedia of Machine Learning and Data Mining.
[17] Matthew Brand,et al. Voice puppetry , 1999, SIGGRAPH.
[18] Christoph Bregler,et al. Video Rewrite: Driving Visual Speech with Audio , 1997, SIGGRAPH.
[19] Mark Pauly,et al. Realtime performance-based facial animation , 2011, ACM Trans. Graph..
[20] Jun Yu,et al. Realtime speech-driven facial animation using Gaussian Mixture Models , 2014, 2014 IEEE International Conference on Multimedia and Expo Workshops (ICMEW).
[21] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[22] Derek Bradley,et al. High-quality passive facial performance capture using anchor frames , 2011, ACM Trans. Graph..
[23] Thabo Beeler,et al. Real-time high-fidelity facial performance capture , 2015, ACM Trans. Graph..
[24] Eugene Fiume,et al. JALI , 2016, ACM Trans. Graph..
[25] Tomaso Poggio,et al. Trainable Videorealistic Speech Animation , 2004, FGR.
[26] Simon King,et al. Investigating the shortcomings of HMM synthesis , 2013, SSW.
[27] Li Zhang,et al. Spacetime faces: high resolution capture for modeling and animation , 2004, SIGGRAPH 2004.
[28] Lei Xie,et al. A coupled HMM approach to video-realistic speech animation , 2007, Pattern Recognit..
[29] Jonathan G. Fiscus,et al. Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .
[30] Rich Caruana,et al. An empirical comparison of supervised learning algorithms , 2006, ICML.
[31] Yuyu Xu,et al. A Practical and Configurable Lip Sync Method for Games , 2013, MIG.
[32] Timothy F. Cootes,et al. Active Appearance Models , 1998, ECCV.
[33] Yisong Yue,et al. A Decision Tree Framework for Spatiotemporal Sequence Prediction , 2015, KDD.
[34] H. McGurk,et al. Hearing lips and seeing voices , 1976, Nature.
[35] Dana Z. Anderson. Neural information processing systems : Denver, Co, 1987 , 1988 .
[36] Mark Liberman,et al. Speaker identification on the SCOTUS corpus , 2008 .
[37] Ronald A. Cole,et al. Accurate visible speech synthesis based on concatenating variable length motion capture data , 2006, IEEE Transactions on Visualization and Computer Graphics.
[38] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.
[39] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[40] Razvan Pascanu,et al. Theano: new features and speed improvements , 2012, ArXiv.
[41] Heiga Zen,et al. The HMM-based speech synthesis system (HTS) version 2.0 , 2007, SSW.
[42] Dimitri Palaz,et al. Towards End-to-End Speech Recognition , 2016 .
[43] Salil Deena,et al. Visual speech synthesis by modelling coarticulation dynamics using a non-parametric switching state-space model , 2010, ICMI-MLMI '10.
[44] Jj Odell,et al. The Use of Context in Large Vocabulary Speech Recognition , 1995 .
[45] Frédéric H. Pighin,et al. Expressive speech-driven facial animation , 2005, TOGS.
[46] Moshe Mahler,et al. Dynamic units of visual speech , 2012, SCA '12.
[47] Kun Zhou,et al. Real-time facial animation on mobile devices , 2014, Graph. Model..
[48] Andrew Jones,et al. Driving High-Resolution Facial Scans with Video Performance Capture , 2014, ACM Trans. Graph..
[49] Gwenn Englebienne,et al. A probabilistic model for generating realistic lip movements from speech , 2007, NIPS.
[50] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[51] Hao Li,et al. Realtime performance-based facial animation , 2011, ACM Trans. Graph..
[52] Barry-John Theobald,et al. Relating Objective and Subjective Performance Measures for AAM-Based Visual Speech Synthesis , 2012, IEEE Transactions on Audio, Speech, and Language Processing.
[53] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[54] Gérard Bailly,et al. A new trainable trajectory formation system for facial animation , 2006, ExLing.
[55] Steve Young,et al. The HTK book , 1995 .
[56] Hans Peter Graf,et al. Photo-Realistic Talking-Heads from Image Samples , 2000, IEEE Trans. Multim..
[57] Jason Weston,et al. Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..
[58] Michael Pucher,et al. Simultaneous speech and animation synthesis , 2011, SIGGRAPH '11.
[59] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.