暂无分享,去创建一个
Daniel Cremers | Thomas Kemp | Vladimir Golkov | Giorgio Fabbro | D. Cremers | T. Kemp | V. Golkov | Giorgio Fabbro
[1] Xu Tan,et al. FastSpeech: Fast, Robust and Controllable Text to Speech , 2019, NeurIPS.
[2] Michael Schoeffler,et al. webMUSHRA — A Comprehensive Framework for Web-based Listening Tests , 2018 .
[3] Ryan Prenger,et al. Mellotron: Multispeaker Expressive Voice Synthesis by Conditioning on Rhythm, Pitch and Global Style Tokens , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[4] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[5] Ryan Prenger,et al. Waveglow: A Flow-based Generative Network for Speech Synthesis , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[6] Xin Wang,et al. Neural Source-Filter Waveform Models for Statistical Parametric Speech Synthesis , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[7] Julius O. Smith,et al. Spectral modeling synthesis: A sound analysis/synthesis based on a deterministic plus stochastic decomposition , 1990 .
[8] Kurt Keutzer,et al. SqueezeWave: Extremely Lightweight Vocoders for On-device Speech Synthesis , 2020, ArXiv.
[9] Slava Shechtman,et al. Controllable Sequence-To-Sequence Neural TTS with LPCNET Backend for Real-time Speech Synthesis on CPU , 2020 .
[10] Heiga Zen,et al. Hierarchical Generative Modeling for Controllable Speech Synthesis , 2018, ICLR.
[11] Heiga Zen,et al. Non-Attentive Tacotron: Robust and Controllable Neural TTS Synthesis Including Unsupervised Duration Modeling , 2020, ArXiv.
[12] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[13] Jae-Sung Bae,et al. Speaking Speed Control of End-to-End Speech Synthesis using Sentence-Level Conditioning , 2020, INTERSPEECH.
[14] Chris Donahue,et al. Adversarial Audio Synthesis , 2018, ICLR.
[15] Tie-Yan Liu,et al. FastSpeech 2: Fast and High-Quality End-to-End Text to Speech , 2020, ArXiv.
[16] Ryan Prenger,et al. Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis , 2020, ICLR.
[17] Ingo R. Titze,et al. Principles of voice production , 1994 .
[18] Jae Lim,et al. Signal estimation from modified short-time Fourier transform , 1984 .
[19] Sungwon Kim,et al. Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search , 2020, NeurIPS.
[20] Heiga Zen,et al. Parallel WaveNet: Fast High-Fidelity Speech Synthesis , 2017, ICML.
[21] Yuxuan Wang,et al. Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis , 2018, ICML.
[22] Hideki Kawahara,et al. YIN, a fundamental frequency estimator for speech and music. , 2002, The Journal of the Acoustical Society of America.
[23] Sungwon Kim,et al. FloWaveNet : A Generative Flow for Raw Audio , 2018, ICML.
[24] Soroosh Mariooryad,et al. Semi-Supervised Generative Modeling for Controllable Speech Synthesis , 2019, ICLR.
[25] Wei Ping,et al. WaveFlow: A Compact Flow-based Model for Raw Audio , 2019, ICML.
[26] Ryuichi Yamamoto,et al. Parallel Wavegan: A Fast Waveform Generation Model Based on Generative Adversarial Networks with Multi-Resolution Spectrogram , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[27] Yoshua Bengio,et al. MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis , 2019, NeurIPS.
[28] Tuomo Raitio,et al. Controllable neural text-to-speech synthesis using intuitive prosodic features , 2020, INTERSPEECH.
[29] Erich Elsen,et al. High Fidelity Speech Synthesis with Adversarial Networks , 2019, ICLR.
[30] Chenjie Gu,et al. DDSP: Differentiable Digital Signal Processing , 2020, ICLR.