Fast, Compact, and High Quality LSTM-RNN Based Statistical Parametric Speech Synthesizers for Mobile Devices
暂无分享,去创建一个
Heiga Zen | Fergus Henderson | Yannis Agiomyrgiannakis | Przemyslaw Szczepaniak | Niels Egberts | H. Zen | Yannis Agiomyrgiannakis | Przemyslaw Szczepaniak | N. Egberts | Fergus Henderson | Niels Egberts
[1] Heiga Zen. Acoustic Modeling for Speech Synthesis: from HMM to RNN , 2015 .
[2] Simon King,et al. Investigating gated recurrent neural networks for speech synthesis , 2016 .
[3] Heiga Zen,et al. Deep learning in speech synthesis , 2013, SSW.
[4] Heiga Zen,et al. Deep mixture density networks for acoustic modeling in statistical parametric speech synthesis , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[5] Zhizheng Wu,et al. Minimum trajectory error training for deep neural networks, combined with stacked bottleneck features , 2015, INTERSPEECH.
[6] Peter F. Brown,et al. The acoustic-modeling problem in automatic speech recognition , 1987 .
[7] Mark J. F. Gales,et al. Tail distribution modelling using the richter and power exponential distributions , 1999, EUROSPEECH.
[8] Alexander Gutkin,et al. Recent Advances in Google Real-Time HMM-Driven Unit Selection Synthesizer , 2016, INTERSPEECH.
[9] Yoshihiko Nankaku,et al. The effect of neural networks in statistical parametric speech synthesis , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[10] Tomoki Toda,et al. Probablistic modelling of F0 in unvoiced regions in HMM based speech synthesis , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.
[11] Zhizheng Wu,et al. From HMMS to DNNS: Where do the improvements come from? , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[12] Frank K. Soong,et al. Modeling DCT parameterized F0 trajectory at intonation phrase level with DNN or decision tree , 2014, INTERSPEECH.
[13] Paavo Alku,et al. Voice source modelling using deep neural networks for statistical parametric speech synthesis , 2014, 2014 22nd European Signal Processing Conference (EUSIPCO).
[14] Paul J. Werbos,et al. Backpropagation Through Time: What It Does and How to Do It , 1990, Proc. IEEE.
[15] Frank K. Soong,et al. TTS synthesis with bidirectional LSTM based recurrent neural networks , 2014, INTERSPEECH.
[16] Georg Heigold,et al. An empirical study of learning rates in deep neural networks for speech recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[17] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[18] Daniel J. Hsu,et al. Loss Minimization and Parameter Estimation with Heavy Tails , 2013, J. Mach. Learn. Res..
[19] Rohit Prabhavalkar,et al. On the Efficient Representation and Execution of Deep Acoustic Models , 2016, INTERSPEECH.
[20] K. Koishida,et al. Vector quantization of speech spectral parameters using statistics of dynamic features , 1997 .
[21] Bhuvana Ramabhadran,et al. Prosody contour prediction with long short-term memory, bi-directional, deep recurrent neural networks , 2014, INTERSPEECH.
[22] Zhizheng Wu,et al. Fusion of multiple parameterisations for DNN-based sinusoidal speech synthesis with multi-task learning , 2015, INTERSPEECH.
[23] Geoffrey E. Hinton,et al. On rectified linear units for speech processing , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[24] 吉村 貴克,et al. Simultaneous modeling of phonetic and prosodic parameters,and characteristic conversion for HMM-based text-to-speech systems , 2002 .
[25] Yannis Agiomyrgiannakis,et al. Vocaine the vocoder and applications in speech synthesis , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[26] Cassia Valentini-Botinhao,et al. Modelling acoustic feature dependencies with artificial neural networks: Trajectory-RNADE , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[27] Frank K. Soong,et al. On the training aspects of Deep Neural Network (DNN) for parametric TTS synthesis , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[28] Frank K. Soong,et al. Multi-speaker modeling and speaker adaptation for DNN-based TTS synthesis , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[29] Heiga Zen,et al. Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[30] Georg Heigold,et al. Multiframe deep neural networks for acoustic modeling , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[31] Srikanth Ronanki,et al. Robust TTS duration modelling using DNNS , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[32] Heiga Zen,et al. Statistical Parametric Speech Synthesis , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.
[33] S. King,et al. Combining a vector space representation of linguistic context with a deep neural network for text-to-speech synthesis , 2013, SSW.
[34] Heiga Zen,et al. Statistical parametric speech synthesis using deep neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[35] Keiichi Tokuda,et al. Speech parameter generation algorithms for HMM-based speech synthesis , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).
[36] Haifeng Li,et al. Sequence error (SE) minimization training of neural network for voice conversion , 2014, INTERSPEECH.
[37] Andrew W. Senior,et al. Long short-term memory recurrent neural network architectures for large scale acoustic modeling , 2014, INTERSPEECH.
[38] Simon King,et al. Deep neural networks employing Multi-Task Learning and stacked bottleneck features for speech synthesis , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[39] Alexander Gutkin,et al. Quantized HMMs for low footprint text-to-speech synthesis , 2010, INTERSPEECH.
[40] Kai Yu,et al. An investigation of implementation and performance analysis of DNN based speech synthesis system , 2014, 2014 12th International Conference on Signal Processing (ICSP).
[41] Zhizheng Wu,et al. Sentence-level control vectors for deep neural network speech synthesis , 2015, INTERSPEECH.
[42] Anthony J. Robinson,et al. Static and Dynamic Error Propagation Networks with Application to Speech Coding , 1987, NIPS.