Synthesising visual speech using dynamic visemes and deep learning architectures
暂无分享,去创建一个
[1] Petros Maragos,et al. Video-realistic expressive audio-visual speech synthesis for the Greek language , 2017, Speech Commun..
[2] Michael M. Cohen,et al. Modeling Coarticulation in Synthetic Visual Speech , 1993 .
[3] H. Zen,et al. An HMM-based speech synthesis system applied to English , 2002, Proceedings of 2002 IEEE Workshop on Speech Synthesis, 2002..
[4] Christoph Bregler,et al. Video Rewrite: Driving Visual Speech with Audio , 1997, SIGGRAPH.
[5] Björn Stenger,et al. Expressive Visual Text-to-Speech Using Active Appearance Models , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.
[6] Moshe Mahler,et al. Dynamic units of visual speech , 2012, SCA '12.
[7] Wesley Mattheyses,et al. Comprehensive many-to-many phoneme-to-viseme mapping and its application for concatenative visual speech synthesis , 2013, Speech Commun..
[8] Jonas Beskow,et al. Visual phonemic ambiguity and speechreading. , 2006, Journal of speech, language, and hearing research : JSLHR.
[9] Wesley Mattheyses,et al. Automatic Viseme Clustering for Audiovisual Speech Synthesis , 2011, INTERSPEECH.
[10] J. Gower. Generalized procrustes analysis , 1975 .
[11] Heiga Zen,et al. Statistical Parametric Speech Synthesis , 2007, IEEE International Conference on Acoustics, Speech, and Signal Processing.
[12] Lei Xie,et al. Photo-real talking head with deep bidirectional LSTM , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[13] José Mario De Martino,et al. Facial animation based on context-dependent visemes , 2006, Comput. Graph..
[14] Geoffrey E. Hinton,et al. Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.
[15] Tony Ezzat,et al. Trainable videorealistic speech animation , 2002, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..
[16] Heiga Zen,et al. The HMM-based speech synthesis system (HTS) version 2.0 , 2007, SSW.
[17] John Salvatier,et al. Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.
[18] Geoffrey E. Hinton,et al. Deep Learning , 2015, Nature.
[19] Maurizio Omologo,et al. Automatic segmentation and labeling of speech based on Hidden Markov Models , 1993, Speech Commun..
[20] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[21] Timothy F. Cootes,et al. Active Appearance Models , 2001, IEEE Trans. Pattern Anal. Mach. Intell..
[22] Ben P. Milner,et al. Analysis of correlation between audio and visual speech features for clean audio feature prediction in noise , 2006, INTERSPEECH.
[23] Navdeep Jaitly,et al. Hybrid speech recognition with Deep Bidirectional LSTM , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.
[24] Frank K. Soong,et al. Synthesizing visual speech trajectory with minimum generation error , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[25] Heiga Zen,et al. Statistical parametric speech synthesis using deep neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[26] Kuldip K. Paliwal,et al. Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..
[27] Keith Waters,et al. Computer Facial Animation, Second Edition , 1996 .
[28] Samy Bengio,et al. Tacotron: A Fully End-to-End Text-To-Speech Synthesis Model , 2017, ArXiv.
[29] Lei Xie,et al. Head motion synthesis from speech using deep neural networks , 2015, Multimedia Tools and Applications.
[30] Alan W. Black,et al. Unit selection in a concatenative speech synthesis system using a large speech database , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.
[31] Barry-John Theobald,et al. HMM-based visual speech synthesis using dynamic visemes , 2015, AVSP.
[32] Yisong Yue,et al. A deep learning approach for generalized speech animation , 2017, ACM Trans. Graph..
[33] Keith Waters,et al. Computer facial animation , 1996 .
[34] Frank K. Soong,et al. A deep bidirectional LSTM approach for video-realistic talking head , 2016, Multimedia Tools and Applications.
[35] C. G. Fisher,et al. Confusions among visually perceived consonants. , 1968, Journal of speech and hearing research.
[36] Yisong Yue,et al. A Decision Tree Framework for Spatiotemporal Sequence Prediction , 2015, KDD.
[37] Michael Pucher,et al. Joint Audiovisual Hidden Semi-Markov Model-Based Speech Synthesis , 2014, IEEE Journal of Selected Topics in Signal Processing.
[38] Jun Yu,et al. Realtime speech-driven facial animation using Gaussian Mixture Models , 2014, 2014 IEEE International Conference on Multimedia and Expo Workshops (ICMEW).
[39] Larry B. Wallnau,et al. Statistics for the Behavioral Sciences , 1985 .
[40] S. Holm. A Simple Sequentially Rejective Multiple Test Procedure , 1979 .
[41] Ranniery Maia,et al. Expressive visual text to speech and expression adaptation using deep neural networks , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[42] Zhizheng Wu,et al. From HMMS to DNNS: Where do the improvements come from? , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[43] Tomaso Poggio,et al. Trainable Videorealistic Speech Animation , 2004, FGR.
[44] Raúl Rojas,et al. The Backpropagation Algorithm , 1996 .
[45] Ben P. Milner,et al. Audio-to-Visual Speech Conversion Using Deep Neural Networks , 2016, INTERSPEECH.