Reinforce-Aligner: Reinforcement Alignment Search for Robust End-to-End Text-to-Speech
暂无分享,去创建一个
[1] Frank Hutter,et al. Decoupled Weight Decay Regularization , 2017, ICLR.
[2] Navdeep Jaitly,et al. Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[3] Xu Tan,et al. FastSpeech: Fast, Robust and Controllable Text to Speech , 2019, NeurIPS.
[4] Raymond Y. K. Lau,et al. Least Squares Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).
[5] R. Kubichek,et al. Mel-cepstral distance measure for objective speech quality assessment , 1993, Proceedings of IEEE Pacific Rim Conference on Communications Computers and Signal Processing.
[6] Seong-Whan Lee,et al. Multi-SpectroGAN: High-Diversity and High-Fidelity Spectrogram Generation with Adversarial Style Combination for Speech Synthesis , 2020, AAAI.
[7] Shuang Liang,et al. EfficientTTS: An Efficient and High-Quality Text-to-Speech Architecture , 2020, ICML.
[8] Hideki Kawahara,et al. Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds , 1999, Speech Commun..
[9] Adam Coates,et al. Deep Voice: Real-time Neural Text-to-Speech , 2017, ICML.
[10] Chao Weng,et al. VARA-TTS: Non-Autoregressive Text-to-Speech Synthesis based on Very Deep VAE with Residual Attention , 2021, ArXiv.
[11] Sungwon Kim,et al. Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search , 2020, NeurIPS.
[12] Shujie Liu,et al. Neural Speech Synthesis with Transformer Network , 2018, AAAI.
[13] Heiga Zen,et al. Non-Attentive Tacotron: Robust and Controllable Neural TTS Synthesis Including Unsupervised Duration Modeling , 2020, ArXiv.
[14] Tomoki Toda,et al. An investigation of multi-speaker training for wavenet vocoder , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[15] Adrian La'ncucki. FastPitch: Parallel Text-to-speech with Pitch Prediction , 2020, ArXiv.
[16] Marco Cuturi,et al. Soft-DTW: a Differentiable Loss Function for Time-Series , 2017, ICML.
[17] Soroosh Mariooryad,et al. Wave-Tacotron: Spectrogram-free end-to-end text-to-speech synthesis , 2020, ArXiv.
[18] Jae Lim,et al. Signal estimation from modified short-time Fourier transform , 1984 .
[19] Samy Bengio,et al. Tacotron: Towards End-to-End Speech Synthesis , 2017, INTERSPEECH.
[20] Ming Liu,et al. RobuTrans: A Robust Transformer-Based Text-to-Speech Model , 2020, AAAI.
[21] Yoshua Bengio,et al. MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis , 2019, NeurIPS.
[22] Keiichi Tokuda,et al. Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis , 1999, EUROSPEECH.
[23] Hiroaki Sakoe,et al. A Dynamic Programming Approach to Continuous Speech Recognition , 1971 .
[24] D. Lim,et al. JDI-T: Jointly trained Duration Informed Transformer for Text-To-Speech without Explicit Alignment , 2020, INTERSPEECH.
[25] Kyomin Jung,et al. Bidirectional Variational Inference for Non-Autoregressive Text-to-Speech , 2021, ICLR.
[26] Erich Elsen,et al. End-to-End Adversarial Text-to-Speech , 2020, ArXiv.
[27] Lilly Irani,et al. Amazon Mechanical Turk , 2018, Advances in Intelligent Systems and Computing.
[28] Ondrej Dusek,et al. SpeedySpeech: Efficient Neural Speech Synthesis , 2020, INTERSPEECH.
[29] Enhong Chen,et al. Lightspeech: Lightweight and Fast Text to Speech with Neural Architecture Search , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[30] John Williamson,et al. A High Performance Spelling System based on EEG-EOG Signals With Visual Feedback , 2018, IEEE Transactions on Neural Systems and Rehabilitation Engineering.
[31] Heung-Il Suk,et al. Subject and class specific frequency bands selection for multiclass motor imagery classification , 2011, Int. J. Imaging Syst. Technol..
[32] Jaehyeon Kim,et al. HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis , 2020, NeurIPS.
[33] Sercan Ömer Arik,et al. Deep Voice 2: Multi-Speaker Neural Text-to-Speech , 2017, NIPS.
[34] Heiga Zen,et al. Parallel Tacotron: Non-Autoregressive and Controllable TTS , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).