暂无分享,去创建一个
[1] Jürgen Schmidhuber,et al. Multi-dimensional Recurrent Neural Networks , 2007, ICANN.
[2] Gregory Diamos,et al. Fast Spectrogram Inversion Using Multi-Head Convolutional Neural Networks , 2018, IEEE Signal Processing Letters.
[3] Tara N. Sainath,et al. Modeling Time-Frequency Patterns with LSTM vs. Convolutional Architectures for LVCSR Tasks , 2016, INTERSPEECH.
[4] Prafulla Dhariwal,et al. Glow: Generative Flow with Invertible 1x1 Convolutions , 2018, NeurIPS.
[5] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.
[6] Karen Simonyan,et al. The challenge of realistic music generation: modelling raw audio at scale , 2018, NeurIPS.
[7] Mohammad Norouzi,et al. Pixel Recursive Super Resolution , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[8] Douglas Eck,et al. Music Transformer , 2018, 1809.04281.
[9] Erich Elsen,et al. Efficient Neural Audio Synthesis , 2018, ICML.
[10] Matthias Bethge,et al. Generative Image Modeling Using Spatial LSTMs , 2015, NIPS.
[11] Alex Graves,et al. Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.
[12] Dustin Tran,et al. Image Transformer , 2018, ICML.
[13] Geoffrey Zweig,et al. Exploring multidimensional lstms for large vocabulary ASR , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[14] Lior Wolf,et al. VoiceLoop: Voice Fitting and Synthesis via a Phonological Loop , 2017, ICLR.
[15] Samy Bengio,et al. Tacotron: Towards End-to-End Speech Synthesis , 2017, INTERSPEECH.
[16] Andrew M. Dai,et al. Music Transformer: Generating Music with Long-Term Structure , 2018, ICLR.
[17] Jaakko Lehtinen,et al. Progressive Growing of GANs for Improved Quality, Stability, and Variation , 2017, ICLR.
[18] Yoshua Bengio,et al. Char2Wav: End-to-End Speech Synthesis , 2017, ICLR.
[19] Xi Chen,et al. PixelCNN++: Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modifications , 2017, ICLR.
[20] Ilya Sutskever,et al. Generating Long Sequences with Sparse Transformers , 2019, ArXiv.
[21] Torsten Dau,et al. Inversion of Auditory Spectrograms, Traditional Spectrograms, and Other Envelope Representations , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[22] Adam Coates,et al. Deep Voice: Real-time Neural Text-to-Speech , 2017, ICML.
[23] Navdeep Jaitly,et al. Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[24] Heiga Zen,et al. Sample Efficient Adaptive Text-to-Speech , 2018, ICLR.
[25] Kumar Krishna Agrawal,et al. GANSynth: Adversarial Neural Audio Synthesis , 2019, ICLR.
[26] Joon Son Chung,et al. VoxCeleb2: Deep Speaker Recognition , 2018, INTERSPEECH.
[27] Heiga Zen,et al. Parallel WaveNet: Fast High-Fidelity Speech Synthesis , 2017, ICML.
[28] Yoshua Bengio,et al. ReNet: A Recurrent Neural Network Based Alternative to Convolutional Networks , 2015, ArXiv.
[29] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[30] Heiga Zen,et al. Hierarchical Generative Modeling for Controllable Speech Synthesis , 2018, ICLR.
[31] Douglas Eck,et al. Enabling Factorized Piano Music Modeling and Generation with the MAESTRO Dataset , 2018, ICLR.
[32] Yutaka Matsuo,et al. Expressive Speech Synthesis via Modeling Expressions with Variational Autoencoder , 2018, INTERSPEECH.
[33] Yoshua Bengio,et al. SampleRNN: An Unconditional End-to-End Neural Audio Generation Model , 2016, ICLR.
[34] S. King,et al. The Blizzard Challenge 2013 , 2013, The Blizzard Challenge 2013.
[35] Alex Graves,et al. Conditional Image Generation with PixelCNN Decoders , 2016, NIPS.
[36] Wei Ping,et al. ClariNet: Parallel Wave Generation in End-to-End Text-to-Speech , 2018, ICLR.
[37] Chris Donahue,et al. Adversarial Audio Synthesis , 2018, ICLR.
[38] T. Munich,et al. Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks , 2008, NIPS.
[39] Chris Donahue,et al. Synthesizing Audio with Generative Adversarial Networks , 2018, ArXiv.
[40] Jae Lim,et al. Signal estimation from modified short-time Fourier transform , 1984 .
[41] Oriol Vinyals,et al. Neural Discrete Representation Learning , 2017, NIPS.
[42] Sergio Gomez Colmenarejo,et al. Parallel Multiscale Autoregressive Density Estimation , 2017, ICML.
[43] Brian Kulis,et al. Conditioning Deep Generative Raw Audio Models for Structured Automatic Music , 2018, ISMIR.
[44] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[45] Koray Kavukcuoglu,et al. Pixel Recurrent Neural Networks , 2016, ICML.
[46] Alex Graves,et al. Grid Long Short-Term Memory , 2015, ICLR.
[47] Pieter Abbeel,et al. PixelSNAIL: An Improved Autoregressive Generative Model , 2017, ICML.
[48] Yannick Estève,et al. TED-LIUM 3: twice as much data and corpus repartition for experiments on speaker adaptation , 2018, SPECOM.
[49] Nal Kalchbrenner,et al. Generating High Fidelity Images with Subscale Pixel Networks and Multidimensional Upscaling , 2018, ICLR.
[50] C. Bishop. Mixture density networks , 1994 .
[51] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[52] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.
[53] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[54] Ryan Prenger,et al. Waveglow: A Flow-based Generative Network for Speech Synthesis , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).