Effective Spectral and Excitation Modeling Techniques for LSTM-RNN-Based Speech Synthesis Systems
暂无分享,去创建一个
[1] Joseph P. Campbell,et al. The Dod 4.8 Kbps Standard (Proposed Federal Standard 1016) , 1991 .
[2] Keiichi Tokuda,et al. Mixed excitation for HMM-based speech synthesis , 2001, INTERSPEECH.
[3] Frank K. Soong,et al. On the training aspects of Deep Neural Network (DNN) for parametric TTS synthesis , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[4] Hong-Goo Kang,et al. Enhancement of spectral clarity for HMM-based text-to-speech systems , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[5] Thomas P. Barnwell,et al. MCCREE AND BARNWELL MIXED EXCITAmON LPC VOCODER MODEL LPC SYNTHESIS FILTER 243 SYNTHESIZED SPEECH-PERIODIC PULSE TRAIN-1 PERIODIC POSITION JITTER PULSE 4 , 2004 .
[6] Keiichi Tokuda,et al. Speech parameter generation algorithms for HMM-based speech synthesis , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).
[7] Zhizheng Wu,et al. From HMMS to DNNS: Where do the improvements come from? , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[8] Heiga Zen,et al. Statistical parametric speech synthesis using deep neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[9] Hong-Goo Kang,et al. Improved time-frequency trajectory excitation modeling for a statistical parametric speech synthesis system , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[10] Yu Hongzhi,et al. Research on HMM_based speech synthesis for Lhasa dialect , 2011, 2011 International Conference on Image Analysis and Signal Processing.
[11] Eddie L. T. Choy,et al. Waveform Interpolation Speech Coder at 4 kb/s , 1998 .
[12] Andrew W. Senior,et al. Long short-term memory recurrent neural network architectures for large scale acoustic modeling , 2014, INTERSPEECH.
[13] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[14] Keiichi Tokuda,et al. Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis , 1999, EUROSPEECH.
[15] Masanori Morise,et al. WORLD: A Vocoder-Based High-Quality Speech Synthesis System for Real-Time Applications , 2016, IEICE Trans. Inf. Syst..
[16] Suryakanth V. Gangashetty,et al. An investigation of recurrent neural network architectures for statistical parametric speech synthesis , 2015, INTERSPEECH.
[17] Allen Gersho,et al. Real-time vector APC speech coding at 4800 bps with adaptive postfiltering , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.
[18] Dong Yu,et al. Feature engineering in Context-Dependent Deep Neural Networks for conversational speech transcription , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.
[19] Willem Bastiaan Kleijn,et al. Continuous representations in linear predictive coding , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.
[20] Heiga Zen,et al. Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[21] Frank K. Soong,et al. Improved Time-Frequency Trajectory Excitation Vocoder for DNN-Based Speech Synthesis , 2016, INTERSPEECH.
[22] Anthony J. Robinson,et al. Static and Dynamic Error Propagation Networks with Application to Speech Coding , 1987, NIPS.
[23] Hong-Goo Kang,et al. Deep neural network-based statistical parametric speech synthesis system using improved time-frequency trajectory excitation model , 2015, INTERSPEECH.
[24] Yoshihiko Nankaku,et al. The effect of neural networks in statistical parametric speech synthesis , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[25] Jing Peng,et al. An Efficient Gradient-Based Algorithm for On-Line Training of Recurrent Network Trajectories , 1990, Neural Computation.
[26] Hong-Goo Kang,et al. Multi-class learning algorithm for deep neural network-based statistical parametric speech synthesis , 2016, 2016 24th European Signal Processing Conference (EUSIPCO).
[27] Frank K. Soong,et al. TTS synthesis with bidirectional LSTM based recurrent neural networks , 2014, INTERSPEECH.
[28] Sadaoki Furui,et al. Speaker-independent isolated word recognition using dynamic features of speech spectrum , 1986, IEEE Trans. Acoust. Speech Signal Process..
[29] Heiga Zen,et al. Details of the Nitech HMM-Based Speech Synthesis System for the Blizzard Challenge 2005 , 2007, IEICE Trans. Inf. Syst..
[30] Geoffrey E. Hinton,et al. Learning internal representations by error propagation , 1986 .
[31] Fumitada Itakura,et al. Speech analysis and synthesis methods developed at ECL in NTT - From LPC to LSP - , 1986, Speech Commun..
[32] Thomas Quatieri,et al. Discrete-Time Speech Signal Processing: Principles and Practice , 2001 .
[33] Biing-Hwang Juang,et al. Line spectrum pair (LSP) and speech data compression , 1984, ICASSP.
[34] Masanori Morise,et al. D4C, a band-aperiodicity estimator for high-quality speech synthesis , 2016, Speech Commun..
[35] W. Bastiaan Kleijn,et al. A speech coder based on decomposition of characteristic waveforms , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.
[36] Hideki Kawahara,et al. Speech representation and transformation using adaptive interpolation of weighted spectrum: vocoder revisited , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[37] Ren-Hua Wang,et al. USTC System for Blizzard Challenge 2006 an Improved HMM-based Speech Synthesis Method , 2006, Blizzard Challenge.
[38] Geoffrey Zweig,et al. An introduction to computational networks and the computational network toolkit (invited talk) , 2014, INTERSPEECH.