BigVGAN: A Universal Neural Vocoder with Large-Scale Training
暂无分享,去创建一个
[1] Siwei Lyu,et al. VocBench: A Neural Vocoder Benchmark for Speech Synthesis , 2021, ArXiv.
[2] Aaron C. Courville,et al. Chunked Autoregressive GAN for Conditional Waveform Synthesis , 2021, ICLR.
[3] Tie-Yan Liu,et al. PriorGrad: Improving Conditional Denoising Diffusion Models with Data-Dependent Adaptive Prior , 2021, ICLR.
[4] Marco Tagliasacchi,et al. SoundStream: An End-to-End Neural Audio Codec , 2022, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[5] Jaakko Lehtinen,et al. Alias-Free Generative Adversarial Networks , 2021, NeurIPS.
[6] Jaesam Yoon,et al. UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation , 2021, Interspeech.
[7] J. You,et al. GAN Vocoder: Multi-Resolution Discriminator Is All You Need , 2021, Interspeech.
[8] Alec Radford,et al. Zero-Shot Text-to-Image Generation , 2021, ICML.
[9] Douglas W. Oard,et al. The Multilingual TEDx Corpus for Speech Recognition and Translation , 2021, Interspeech.
[10] Daniel Korzekwa,et al. Universal Neural Vocoding with Parallel Wavenet , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[11] Guillaume Fuchs,et al. StyleMelGAN: An Efficient High-Fidelity Adversarial Vocoder with Temporal Adaptive Normalization , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[12] Joan Serra,et al. Upsampling Artifacts in Neural Audio Synthesis , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[13] Bryan Catanzaro,et al. DiffWave: A Versatile Diffusion Model for Audio Synthesis , 2020, ICLR.
[14] Heiga Zen,et al. WaveGrad: Estimating Gradients for Waveform Generation , 2020, ICLR.
[15] Lei Xie,et al. Multi-Band Melgan: Faster Waveform Generation For High-Quality Text-To-Speech , 2020, 2021 IEEE Spoken Language Technology Workshop (SLT).
[16] D. Lim,et al. Universal MelGAN: A Robust Neural Vocoder for High-Fidelity Waveform Generation in Multiple Domains , 2020, ArXiv.
[17] Jaehyeon Kim,et al. HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis , 2020, NeurIPS.
[18] Yannis Stylianou,et al. Speaker Conditional WaveRNN: Towards Universal Neural Vocoder for Unseen Speaker and Recording Conditions , 2020, INTERSPEECH.
[19] Ziyin Liu,et al. Neural Networks Fail to Learn Periodic Functions and How to Fix It , 2020, NeurIPS.
[20] Sungroh Yoon,et al. NanoFlow: Scalable Normalizing Flows with Sublinear Parameter Complexity , 2020, NeurIPS.
[21] Tero Karras,et al. Training Generative Adversarial Networks with Limited Data , 2020, NeurIPS.
[22] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[23] Wei Ping,et al. WaveFlow: A Compact Flow-based Model for Raw Audio , 2019, ICML.
[24] Ryuichi Yamamoto,et al. Parallel Wavegan: A Fast Waveform Generation Model Based on Generative Adversarial Networks with Multi-Resolution Spectrogram , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[25] Erich Elsen,et al. High Fidelity Speech Synthesis with Adversarial Networks , 2019, ICLR.
[26] Junichi Yamagishi,et al. CSTR VCTK Corpus: English Multi-speaker Corpus for CSTR Voice Cloning Toolkit (version 0.92) , 2019 .
[27] Yoshua Bengio,et al. MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis , 2019, NeurIPS.
[28] Johannes Gehrke,et al. A scalable noisy speech dataset and online subjective test framework , 2019, INTERSPEECH.
[29] Quoc V. Le,et al. SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition , 2019, INTERSPEECH.
[30] Melvin Johnson,et al. Direct speech-to-speech translation with a sequence-to-sequence model , 2019, INTERSPEECH.
[31] Heiga Zen,et al. LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech , 2019, INTERSPEECH.
[32] Thomas Drugman,et al. Towards Achieving Robust Universal Neural Vocoding , 2018, INTERSPEECH.
[33] Sungwon Kim,et al. FloWaveNet : A Generative Flow for Raw Audio , 2018, ICML.
[34] Ryan Prenger,et al. Waveglow: A Flow-based Generative Network for Speech Synthesis , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[35] Jeff Donahue,et al. Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.
[36] Wei Ping,et al. ClariNet: Parallel Wave Generation in End-to-End Text-to-Speech , 2018, ICLR.
[37] Fabian-Robert Stöter,et al. MUSDB18-HQ - an uncompressed version of MUSDB18 , 2019 .
[38] Li-Rong Dai,et al. WaveNet Vocoder with Limited Training Data for Voice Conversion , 2018, INTERSPEECH.
[39] Supheakmungkol Sarin,et al. A Step-by-Step Process for Building TTS Voices Using Open Source Data and Frameworks for Bangla, Javanese, Khmer, Nepali, Sinhala, and Sundanese , 2018, SLTU.
[40] Patrick Nguyen,et al. Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis , 2018, NeurIPS.
[41] Erich Elsen,et al. Efficient Neural Audio Synthesis , 2018, ICML.
[42] Yuichi Yoshida,et al. Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.
[43] Sercan Ömer Arik,et al. Neural Voice Cloning with a Few Samples , 2018, NeurIPS.
[44] Heiga Zen,et al. Parallel WaveNet: Fast High-Fidelity Speech Synthesis , 2017, ICML.
[45] Hongyi Zhang,et al. mixup: Beyond Empirical Risk Minimization , 2017, ICLR.
[46] Sercan Ömer Arik,et al. Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning , 2017, ICLR.
[47] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[48] Raymond Y. K. Lau,et al. Least Squares Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).
[49] Yoshua Bengio,et al. SampleRNN: An Unconditional End-to-End Neural Audio Generation Model , 2016, ICLR.
[50] Vincent Dumoulin,et al. Deconvolution and Checkerboard Artifacts , 2016 .
[51] Junichi Yamagishi,et al. SUPERSEDED - CSTR VCTK Corpus: English Multi-speaker Corpus for CSTR Voice Cloning Toolkit , 2016 .
[52] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[53] Ole Winther,et al. Autoencoding beyond pixels using a learned similarity metric , 2015, ICML.
[54] Soumith Chintala,et al. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.
[55] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.
[56] Fabrice Labeau,et al. Discrete Time Signal Processing , 2004 .
[57] Andries P. Hekstra,et al. Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).
[58] R. Kubichek,et al. Mel-cepstral distance measure for objective speech quality assessment , 1993, Proceedings of IEEE Pacific Rim Conference on Communications Computers and Signal Processing.
[59] C.E. Shannon,et al. Communication in the Presence of Noise , 1949, Proceedings of the IRE.