暂无分享,去创建一个
Yoshua Bengio | Aaron C. Courville | Aaron Courville | Prem Seetharaman | Rithesh Kumar | Max Morrison | Yoshua Bengio | Kundan Kumar | Rithesh Kumar | Max Morrison | Kundan Kumar | Prem Seetharaman
[1] R. G. McCurdy. Tentative standards for sound level meters , 1936, Electrical Engineering.
[2] Gunnar Fant,et al. Acoustic Theory Of Speech Production , 1960 .
[3] M.G. Bellanger,et al. Digital processing of speech signals , 1980, Proceedings of the IEEE.
[4] Jae S. Lim,et al. Signal estimation from modified short-time Fourier transform , 1983, ICASSP.
[5] Hideki Kawahara,et al. YIN, a fundamental frequency estimator for speech and music. , 2002, The Journal of the Acoustical Society of America.
[6] Aapo Hyvärinen,et al. Estimation of Non-Normalized Statistical Models by Score Matching , 2005, J. Mach. Learn. Res..
[7] Mike Senior,et al. Mixing Secrets for the Small Studio , 2011 .
[8] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.
[9] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.
[10] Gautham J. Mysore,et al. Can we Automatically Transform Speech Recorded on Common Consumer Devices in Real-World Environments into Professional Production Quality Speech?—A Dataset, Insights, and Challenges , 2015, IEEE Signal Processing Letters.
[11] Yoshua Bengio,et al. NICE: Non-linear Independent Components Estimation , 2014, ICLR.
[12] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[13] Tim Salimans,et al. Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks , 2016, NIPS.
[14] Chong Wang,et al. Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.
[15] Masanori Morise,et al. WORLD: A Vocoder-Based High-Quality Speech Synthesis System for Real-Time Applications , 2016, IEICE Trans. Inf. Syst..
[16] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[17] Ole Winther,et al. Autoencoding beyond pixels using a learned similarity metric , 2015, ICML.
[18] Sepp Hochreiter,et al. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.
[19] Gregory Frederick Diamos,et al. Block-Sparse Recurrent Neural Networks , 2017, ArXiv.
[20] Yoshua Bengio,et al. SampleRNN: An Unconditional End-to-End Neural Audio Generation Model , 2016, ICLR.
[21] Prafulla Dhariwal,et al. Glow: Generative Flow with Invertible 1x1 Convolutions , 2018, NeurIPS.
[22] Suyog Gupta,et al. To prune, or not to prune: exploring the efficacy of pruning for model compression , 2017, ICLR.
[23] Yuichi Yoshida,et al. Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.
[24] Jong Wook Kim,et al. Crepe: A Convolutional Representation for Pitch Estimation , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[25] Adam Finkelstein,et al. Fftnet: A Real-Time Speaker-Dependent Neural Vocoder , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[26] Erich Elsen,et al. Efficient Neural Audio Synthesis , 2018, ICML.
[27] Ilya Sutskever,et al. Generating Long Sequences with Sparse Transformers , 2019, ArXiv.
[28] Ryan Prenger,et al. Waveglow: A Flow-based Generative Network for Speech Synthesis , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[29] Yoshua Bengio,et al. MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis , 2019, NeurIPS.
[30] Douglas Eck,et al. Enabling Factorized Piano Music Modeling and Generation with the MAESTRO Dataset , 2018, ICLR.
[31] Fabian-Robert Stöter,et al. MUSDB18-HQ - an uncompressed version of MUSDB18 , 2019 .
[32] Ali Razavi,et al. Generating Diverse High-Fidelity Images with VQ-VAE-2 , 2019, NeurIPS.
[33] Yang Song,et al. Generative Modeling by Estimating Gradients of the Data Distribution , 2019, NeurIPS.
[34] Frank Hutter,et al. Decoupled Weight Decay Regularization , 2017, ICLR.
[35] Xin Wang,et al. Neural Source-filter-based Waveform Model for Statistical Parametric Speech Synthesis , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[36] Junichi Yamagishi,et al. CSTR VCTK Corpus: English Multi-speaker Corpus for CSTR Voice Cloning Toolkit (version 0.92) , 2019 .
[37] Pieter Abbeel,et al. Denoising Diffusion Probabilistic Models , 2020, NeurIPS.
[38] Wei Ping,et al. WaveFlow: A Compact Flow-based Model for Raw Audio , 2019, ICML.
[39] Jaehyeon Kim,et al. HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis , 2020, NeurIPS.
[40] Ilya Sutskever,et al. Jukebox: A Generative Model for Music , 2020, ArXiv.
[41] Gautham J. Mysore,et al. Controllable Neural Prosody Synthesis , 2020, INTERSPEECH.
[42] Erich Elsen,et al. High Fidelity Speech Synthesis with Adversarial Networks , 2019, ICLR.
[43] M. Hasegawa-Johnson,et al. Unsupervised Speech Decomposition via Triple Information Bottleneck , 2020, ICML.
[44] Heiga Zen,et al. WaveGrad: Estimating Gradients for Waveform Generation , 2020, ICLR.
[45] Bernt Schiele,et al. You Only Need Adversarial Supervision for Semantic Image Synthesis , 2020, ICLR.
[46] Jungil Kong,et al. Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech , 2021, ICML.
[47] Chris G. Willcocks,et al. Deep Generative Modelling: A Comparative Review of VAEs, GANs, Normalizing Flows, Energy-Based and Autoregressive Models , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[48] K. Simonyan,et al. End-to-End Adversarial Text-to-Speech , 2020, ICLR.
[49] Nicholas J. Bryan,et al. Segmented DAPS (Device and Produced Speech) Dataset , 2021 .