暂无分享,去创建一个
[1] Dongsuk Yook,et al. Audio-to-Visual Conversion Using Hidden Markov Models , 2002, PRICAI.
[2] Moshe Mahler,et al. Dynamic units of visual speech , 2012, SCA '12.
[3] Dong Yu,et al. Automatic Speech Recognition: A Deep Learning Approach , 2014 .
[4] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[5] Yisong Yue,et al. A Decision Tree Framework for Spatiotemporal Sequence Prediction , 2015, KDD.
[6] Donald J. Berndt,et al. Using Dynamic Time Warping to Find Patterns in Time Series , 1994, KDD Workshop.
[7] Yisong Yue,et al. A deep learning approach for generalized speech animation , 2017, ACM Trans. Graph..
[8] Matthew Brand,et al. Voice puppetry , 1999, SIGGRAPH.
[9] P. Cantor. The Simpsons , 1999 .
[10] Lei Xie,et al. Photo-real talking head with deep bidirectional LSTM , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[11] Hichem Sahli,et al. Context dependent viseme models for voice driven animation , 2003, Proceedings EC-VIP-MC 2003. 4th EURASIP Conference focused on Video/Image Processing and Multimedia Communications (IEEE Cat. No.03EX667).
[12] Navdeep Jaitly,et al. Hybrid speech recognition with Deep Bidirectional LSTM , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.
[13] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[14] Geoffrey E. Hinton,et al. Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[15] Paul Lamere,et al. Sphinx-4: a flexible open source framework for speech recognition , 2004 .
[16] Clément Farabet,et al. Torch7: A Matlab-like Environment for Machine Learning , 2011, NIPS 2011.
[17] Jürgen Schmidhuber,et al. Framewise phoneme classification with bidirectional LSTM and other neural network architectures , 2005, Neural Networks.
[18] Vesa T. Peltonen,et al. Audio-based context recognition , 2006, IEEE Transactions on Audio, Speech, and Language Processing.
[19] Frédéric H. Pighin,et al. Expressive speech-driven facial animation , 2005, TOGS.
[20] Carla Teixeira Lopes,et al. TIMIT Acoustic-Phonetic Continuous Speech Corpus , 2012 .
[21] Wesley Mattheyses,et al. Audiovisual speech synthesis: An overview of the state-of-the-art , 2015, Speech Commun..
[22] Michael M. Cohen,et al. Modeling Coarticulation in Synthetic Visual Speech , 1993 .
[23] Jonas Beskow,et al. Picture my voice: Audio to visual speech synthesis using artificial neural networks , 1999, AVSP.
[24] Tony Ezzat,et al. MikeTalk: a talking facial display based on morphing visemes , 1998, Proceedings Computer Animation '98 (Cat. No.98EX169).
[25] Eugene Fiume,et al. JALI , 2016, ACM Trans. Graph..
[26] Naomi Harte,et al. Phoneme-to-viseme Mapping for Visual Speech Recognition , 2012, ICPRAM.
[27] Navdeep Jaitly,et al. Towards End-To-End Speech Recognition with Recurrent Neural Networks , 2014, ICML.
[28] Shigeo Morishima,et al. Voice Animator: Automatic Lip-Synching in Limited Animation by Audio , 2017, ACE.
[29] Jaakko Lehtinen,et al. Audio-driven facial animation by joint end-to-end learning of pose and emotion , 2017, ACM Trans. Graph..
[30] F. Thomas,et al. The illusion of life : Disney animation , 1981 .
[31] Jonathan G. Fiscus,et al. Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .
[32] Yuyu Xu,et al. A Practical and Configurable Lip Sync Method for Games , 2013, MIG.
[33] Ira Kemelmacher-Shlizerman,et al. Synthesizing Obama , 2017, ACM Trans. Graph..