Neural iTTS: Toward Synthesizing Speech in Real-time with End-to-end Neural Text-to-Speech Framework
暂无分享,去创建一个
[1] Satoshi Nakamura,et al. Listening while speaking: Speech chain by deep learning , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[2] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[3] Frieda Goldman-Eisler,et al. Segmentation of input in simultaneous translation , 1972, Journal of psycholinguistic research.
[4] Graham Neubig,et al. Learning to Translate in Real-time with Neural Machine Translation , 2016, EACL.
[5] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[6] Gérard Bailly,et al. Adaptive Latency for Part-of-Speech Tagging in Incremental Text-to-Speech Synthesis , 2016, INTERSPEECH.
[7] Gérard Bailly,et al. HMM training strategy for incremental speech synthesis , 2015, INTERSPEECH.
[8] Shinnosuke Takamichi,et al. JSUT corpus: free large-scale Japanese speech corpus for end-to-end speech synthesis , 2017, ArXiv.
[9] Timo Baumann. Decision tree usage for incremental parametric speech synthesis , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[10] Tomoki Toda,et al. Constructing a speech translation system using simultaneous interpretation data , 2013, IWSLT.
[11] Alexander H. Waibel,et al. Simultaneous translation of lectures and speeches , 2007, Machine Translation.
[12] Keiichi Tokuda,et al. Speech parameter generation algorithms for HMM-based speech synthesis , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).
[13] Tomoki Toda,et al. Simple, lexicalized choice of translation timing for simultaneous speech translation , 2013, INTERSPEECH.
[14] David Schlangen,et al. Evaluating Prosodic Processing for Incremental Speech Synthesis , 2012, INTERSPEECH.
[15] Navdeep Jaitly,et al. Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[16] Samy Bengio,et al. An Online Sequence-to-Sequence Model Using Partial Conditioning , 2015, NIPS.
[17] Keiichi Tokuda,et al. Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis , 1999, EUROSPEECH.
[18] Keikichi Hirose,et al. Accent Sandhi Estimation of Tokyo Dialect of Japanese Using Conditional Random Fields , 2017, IEICE Trans. Inf. Syst..
[19] Tomoki Toda,et al. Optimizing Segmentation Strategies for Simultaneous Speech Translation , 2014, ACL.
[20] Hermann Ney,et al. Automatic sentence segmentation and punctuation prediction for spoken language translation , 2006, IWSLT.
[21] Satoshi Nakamura,et al. Incremental TTS for Japanese Language , 2018, INTERSPEECH.
[22] Jae Lim,et al. Signal estimation from modified short-time Fourier transform , 1984 .
[23] Tara N. Sainath,et al. Improving the Performance of Online Neural Transducer Models , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[24] Srinivas Bangalore,et al. Real-time Incremental Speech-to-Speech Translation of Dialogs , 2012, NAACL.
[25] Samy Bengio,et al. Tacotron: Towards End-to-End Speech Synthesis , 2017, INTERSPEECH.