Modeling of Rakugo Speech and Its Limitations: Toward Speech Synthesis That Entertains Audiences
暂无分享,去创建一个
Junichi Yamagishi | Xin Wang | Shinji Takaki | Shuhei Kato | Erica Cooper | Yusuke Yasuda | J. Yamagishi | Xin Wang | Shinji Takaki | Erica Cooper | Yusuke Yasuda | Shuhei Kato
[1] Heiga Zen,et al. Hierarchical Generative Modeling for Controllable Speech Synthesis , 2018, ICLR.
[2] Mike Lewis,et al. MelNet: A Generative Model for Audio in the Frequency Domain , 2019, ArXiv.
[3] Soroosh Mariooryad,et al. Effective Use of Variational Embedding Capacity in Expressive End-to-End Speech Synthesis , 2019, ArXiv.
[4] Navdeep Jaitly,et al. Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[5] Heiga Zen,et al. Statistical parametric speech synthesis using deep neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[6] Kuldip K. Paliwal,et al. Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..
[7] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[8] Francesco Visin,et al. A guide to convolution arithmetic for deep learning , 2016, ArXiv.
[9] Yuxuan Wang,et al. Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis , 2018, ICML.
[10] Masanori Morise,et al. WORLD: A Vocoder-Based High-Quality Speech Synthesis System for Real-Time Applications , 2016, IEICE Trans. Inf. Syst..
[11] A. R. Morris,et al. A Study of Accent. Research into Its Nature and Scope in the Light of Experimental Phonetics , 1936 .
[12] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[13] Yoshua Bengio,et al. Attention-Based Models for Speech Recognition , 2015, NIPS.
[14] E. Brunner,et al. The Nonparametric Behrens‐Fisher Problem: Asymptotic Theory and a Small‐Sample Approximation , 2000 .
[15] Günther Wenck. The phonemics of Japanese : questions and attempts , 1966 .
[16] Xin Wang,et al. Investigation of Enhanced Tacotron Text-to-speech Synthesis Systems with Self-attention for Pitch Accent Language , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[17] Mirella Lapata,et al. Long Short-Term Memory-Networks for Machine Reading , 2016, EMNLP.
[18] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[19] Li-Rong Dai,et al. Forward Attention in Sequence- To-Sequence Acoustic Modeling for Speech Synthesis , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[20] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[21] Samy Bengio,et al. Tacotron: Towards End-to-End Speech Synthesis , 2017, INTERSPEECH.
[22] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.
[23] Shujie Liu,et al. Neural Speech Synthesis with Transformer Network , 2018, AAAI.
[24] Yuxuan Wang,et al. Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron , 2018, ICML.
[25] Yoshua Bengio,et al. Zoneout: Regularizing RNNs by Randomly Preserving Hidden Activations , 2016, ICLR.
[26] Xin Wang,et al. Rakugo speech synthesis using segment-to-segment neural transduction and style tokens — toward speech synthesis for entertaining audiences , 2019 .
[27] Tomoki Toda,et al. Speaker-Dependent WaveNet Vocoder , 2017, INTERSPEECH.
[28] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[29] Lauri Juvela,et al. A Comparison of Recent Waveform Generation and Acoustic Modeling Methods for Neural-Network-Based Speech Synthesis , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[30] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[31] The Sound System of Standard Japanese , 1959 .
[32] Xin Wang,et al. Investigating accuracy of pitch-accent annotations in neural network-based speech synthesis and denoising effects , 2018, INTERSPEECH.
[33] Heiga Zen,et al. LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech , 2019, INTERSPEECH.
[34] Heiga Zen,et al. Generating Diverse and Natural Text-to-Speech Samples Using a Quantized Fine-Grained VAE and Autoregressive Prosody Prior , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[35] Yuxuan Wang,et al. Uncovering Latent Style Factors for Expressive Speech Synthesis , 2017, ArXiv.