暂无分享,去创建一个
Ryuichi Yamamoto | Eunwoo Song | Min-Jae Hwang | Jae-Min Kim | Ryuichi Yamamoto | Eunwoo Song | Jae-Min Kim | Min-Jae Hwang
[1] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.
[2] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.
[3] Tao Qin,et al. FastSpeech 2: Fast and High-Quality End-to-End Text to Speech , 2021, ICLR.
[4] Thomas Quatieri,et al. Discrete-Time Speech Signal Processing: Principles and Practice , 2001 .
[5] Ryuichi Yamamoto,et al. Parallel Wavegan: A Fast Waveform Generation Model Based on Generative Adversarial Networks with Multi-Resolution Spectrogram , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[6] Tomoki Toda,et al. Investigation of training data size for real-time neural vocoders on CPUs , 2021 .
[7] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[8] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[9] Frank K. Soong,et al. Effective Spectral and Excitation Modeling Techniques for LSTM-RNN-Based Speech Synthesis Systems , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[10] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[11] Tomoki Toda,et al. Espnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Text-to-Speech Toolkit , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[12] Yuan Cao,et al. Leveraging Weakly Supervised Data to Improve End-to-end Speech-to-text Translation , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[13] Frank K. Soong,et al. LP-WaveNet: Linear Prediction-based WaveNet Speech Synthesis , 2018, 2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC).
[14] Tim Salimans,et al. Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks , 2016, NIPS.
[15] Navdeep Jaitly,et al. Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[16] Xu Tan,et al. FastSpeech: Fast, Robust and Controllable Text to Speech , 2019, NeurIPS.
[17] Yoshua Bengio,et al. Attention-Based Models for Speech Recognition , 2015, NIPS.
[18] Tomoki Toda,et al. Speaker-Dependent WaveNet Vocoder , 2017, INTERSPEECH.
[19] S. Srihari. Mixture Density Networks , 1994 .
[20] Samy Bengio,et al. Tacotron: Towards End-to-End Speech Synthesis , 2017, INTERSPEECH.
[21] Sergey Rybin,et al. You Do Not Need More Data: Improving End-To-End Speech Recognition by Text-To-Speech Data Augmentation , 2020, 2020 13th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI).
[22] Adrian La'ncucki. FastPitch: Parallel Text-to-speech with Pitch Prediction , 2020, ArXiv.
[23] Hisashi Kawai,et al. Tacotron-Based Acoustic Model Using Phoneme Alignment for Practical Neural Text-to-Speech Systems , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[24] Wei Ping,et al. ClariNet: Parallel Wave Generation in End-to-End Text-to-Speech , 2018, ICLR.
[25] Morgan Sonderegger,et al. Montreal Forced Aligner: Trainable Text-Speech Alignment Using Kaldi , 2017, INTERSPEECH.