Expressive visual text to speech and expression adaptation using deep neural networks
暂无分享,去创建一个
Ranniery Maia | Yannis Stylianou | Roberto Cipolla | Jonathan Parker | R. Cipolla | Y. Stylianou | R. Maia | Jonathan Parker
[1] Heiga Zen,et al. Statistical Parametric Speech Synthesis Based on Speaker and Language Factorization , 2012, IEEE Transactions on Audio, Speech, and Language Processing.
[2] Mark J. F. Gales,et al. Complex cepstrum for statistical parametric speech synthesis , 2013, Speech Commun..
[3] Joo-Ho Lee,et al. Talking heads synthesis from audio with deep neural networks , 2015, 2015 IEEE/SICE International Symposium on System Integration (SII).
[4] Timothy F. Cootes,et al. Active Appearance Models , 1998, ECCV.
[5] A. Tikhonov,et al. Numerical Methods for the Solution of Ill-Posed Problems , 1995 .
[6] Keiichi Tokuda,et al. Speech parameter generation algorithms for HMM-based speech synthesis , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).
[7] Björn Stenger,et al. Expressive Visual Text-to-Speech Using Active Appearance Models , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.
[8] Frank K. Soong,et al. HMM trajectory-guided sample selection for photo-realistic talking head , 2014, Multimedia Tools and Applications.
[9] Quoc V. Le,et al. Distributed Representations of Sentences and Documents , 2014, ICML.
[10] Zhizheng Wu,et al. A study of speaker adaptation for DNN-based speech synthesis , 2015, INTERSPEECH.
[11] Helen M. Meng,et al. Multi-distribution deep belief network for speech synthesis , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[12] Xu Li,et al. Low level descriptors based DBLSTM bottleneck feature for speech driven talking avatar , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[13] Tomaso A. Poggio,et al. Reanimating Faces in Images and Video , 2003, Comput. Graph. Forum.
[14] Lei Xie,et al. Photo-real talking head with deep bidirectional LSTM , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[15] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[16] Heiga Zen,et al. Statistical parametric speech synthesis using deep neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[17] Tara N. Sainath,et al. Improving deep neural networks for LVCSR using rectified linear units and dropout , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[18] Frank K. Soong,et al. A deep bidirectional LSTM approach for video-realistic talking head , 2016, Multimedia Tools and Applications.
[19] Jörn Ostermann,et al. Realistic facial expression synthesis for an image-based talking head , 2011, 2011 IEEE International Conference on Multimedia and Expo.