Multi-Singer: Fast Multi-Singer Singing Voice Vocoder With A Large-Scale Corpus
暂无分享,去创建一个
Zhou Zhao | Yi Ren | Rongjie Huang | Jinglin Liu | Feiyang Chen | Chenye Cui | Zhou Zhao | Yi Ren | Rongjie Huang | Jinglin Liu | Feiyang Chen | Chenye Cui
[1] Wei Chen,et al. Multi-band MelGAN: Faster Waveform Generation for High-Quality Text-to-Speech , 2020, ArXiv.
[2] Sanjeev Khudanpur,et al. Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[3] Erich Elsen,et al. High Fidelity Speech Synthesis with Adversarial Networks , 2019, ICLR.
[4] Tao Qin,et al. FastSpeech 2: Fast and High-Quality End-to-End Text to Speech , 2021, ICLR.
[5] Xu Tan,et al. FastSpeech: Fast, Robust and Controllable Text to Speech , 2019, NeurIPS.
[6] Cha Zhang,et al. CROWDMOS: An approach for crowdsourcing mean opinion score studies , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[7] Quan Wang,et al. Generalized End-to-End Loss for Speaker Verification , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[8] Hui Bu,et al. AISHELL-3: A Multi-speaker Mandarin TTS Corpus and the Baselines , 2020, ArXiv.
[9] Arun Ross,et al. JukeBox: A Multilingual Singer Recognition Dataset , 2020, INTERSPEECH.
[10] Yannis Stylianou,et al. Speaker Conditional WaveRNN: Towards Universal Neural Vocoder for Unseen Speaker and Recording Conditions , 2020, INTERSPEECH.
[11] Tie-Yan Liu,et al. SimulSpeech: End-to-End Simultaneous Speech to Text Translation , 2020, ACL.
[12] Tie-Yan Liu,et al. A Study of Non-autoregressive Model for Sequence Generation , 2020, ACL.
[13] Zhou Zhao,et al. WSRGlow: A Glow-based Waveform Generative Model for Audio Super-Resolution , 2021, Interspeech.
[14] Xiangmin Xu,et al. LSSED: A Large-Scale Dataset and Benchmark for Speech Emotion Recognition , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[15] Leon A. Gatys,et al. A Neural Algorithm of Artistic Style , 2015, ArXiv.
[16] Xiaogang Wang,et al. StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[17] Heiga Zen,et al. Parallel WaveNet: Fast High-Fidelity Speech Synthesis , 2017, ICML.
[18] Ryan Prenger,et al. Waveglow: A Flow-based Generative Network for Speech Synthesis , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[19] Wei Ping,et al. DiffWave: A Versatile Diffusion Model for Audio Synthesis , 2020, ICLR.
[20] Benlai Tang,et al. ByteSing: A Chinese Singing Voice Synthesis System Using Duration Allocated Encoder-Decoder Acoustic Models and WaveRNN Vocoders , 2021, 2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP).
[21] Zhou Zhao,et al. EMOVIE: A Mandarin Emotion Speech Dataset with a Simple Emotional Text-to-Speech Model , 2021, Interspeech.
[22] Truong Q. Nguyen. Near-perfect-reconstruction pseudo-QMF banks , 1994, IEEE Trans. Signal Process..
[23] Jyh-Shing Roger Jang,et al. On the Improvement of Singing Voice Separation for Monaural Recordings Using the MIR-1K Dataset , 2010, IEEE Transactions on Audio, Speech, and Language Processing.
[24] Chengzhu Yu,et al. DurIAN: Duration Informed Attention Network For Multimodal Synthesis , 2019, ArXiv.
[25] Shinnosuke Takamichi,et al. JVS-MuSiC: Japanese multispeaker singing-voice corpus , 2020, ArXiv.
[26] Xu Tan,et al. XiaoiceSing: A High-Quality and Integrated Singing Voice Synthesis System , 2020, INTERSPEECH.
[27] Jaehyeon Kim,et al. HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis , 2020, NeurIPS.
[28] Zhou Zhao,et al. DiffSinger: Diffusion Acoustic Model for Singing Voice Synthesis , 2021, ArXiv.
[29] Heiga Zen,et al. WaveGrad: Estimating Gradients for Waveform Generation , 2021, ICLR.
[30] Juhan Nam,et al. Korean Singing Voice Synthesis Based on Auto-Regressive Boundary Equilibrium Gan , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[31] Tie-Yan Liu,et al. HiFiSinger: Towards High-Fidelity Neural Singing Voice Synthesis , 2020, ArXiv.
[32] Youngik Kim,et al. VocGAN: A High-Fidelity Real-time Vocoder with a Hierarchically-nested Adversarial Network , 2020, INTERSPEECH.
[33] Ryuichi Yamamoto,et al. Parallel Wavegan: A Fast Waveform Generation Model Based on Generative Adversarial Networks with Multi-Resolution Spectrogram , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[34] Xin Wang,et al. Pretraining Strategies, Waveform Model Choice, and Acoustic Configurations for Multi-Speaker End-to-End Speech Synthesis , 2020, ArXiv.
[35] Tie-Yan Liu,et al. DeepSinger: Singing Voice Synthesis with Data Mined From the Web , 2020, KDD.
[36] Patrick Nguyen,et al. Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis , 2018, NeurIPS.
[37] Jasper Snoek,et al. A Spectral Energy Distance for Parallel Speech Synthesis , 2020, NeurIPS.
[38] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[39] Tao Qin,et al. MultiSpeech: Multi-Speaker Text to Speech with Transformer , 2020, INTERSPEECH.
[40] Yoshua Bengio,et al. MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis , 2019, NeurIPS.
[41] Alexei A. Efros,et al. Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[42] Erich Elsen,et al. Efficient Neural Audio Synthesis , 2018, ICML.
[43] Ye Wang,et al. The NUS sung and spoken lyrics corpus: A quantitative comparison of singing and speech , 2013, 2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference.