Multi-instrument Music Synthesis with Spectrogram Diffusion
暂无分享,去创建一个
Jesse Engel | Josh Gardner | Ian Simon | Adam Roberts | Curtis Hawthorne | Neil Zeghidour | Ethan Manilow
[1] Jonathan Ho. Classifier-Free Diffusion Guidance , 2022, ArXiv.
[2] David J. Fleet,et al. Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding , 2022, NeurIPS.
[3] Oriol Vinyals,et al. Flamingo: a Visual Language Model for Few-Shot Learning , 2022, ArXiv.
[4] Zeyu Jin,et al. Music Enhancement via Image Translation and Vocoding , 2022, IEEE International Conference on Acoustics, Speech, and Signal Processing.
[5] P. Esling,et al. Streamable Neural Audio Synthesis With Non-Causal Convolutions , 2022, ArXiv.
[6] Prafulla Dhariwal,et al. Hierarchical Text-Conditional Image Generation with CLIP Latents , 2022, ArXiv.
[7] Marc van Zee,et al. Scaling Up Models and Data with t5x and seqio , 2022, J. Mach. Learn. Res..
[8] Albert Gu,et al. It's Raw! Audio Generation with State-Space Models , 2022, ICML.
[9] Oriol Vinyals,et al. General-purpose, long-context autoregressive modeling with Perceiver AR , 2022, ICML.
[10] Tim Salimans,et al. Progressive Distillation for Fast Sampling of Diffusion Models , 2022, ICLR.
[11] Cheng-Zhi Anna Huang,et al. MIDI-DDSP: Detailed Control of Musical Performance via Hierarchical Modeling , 2021, ArXiv.
[12] Jesse Engel,et al. MT3: Multi-Task Multitrack Music Transcription , 2021, ICLR.
[13] Aaron C. Courville,et al. Chunked Autoregressive GAN for Conditional Waveform Synthesis , 2021, ICLR.
[14] Vadim Popov,et al. Diffusion-Based Voice Conversion with Fast Maximum Likelihood Sampling Scheme , 2021, ICLR.
[15] Marco Tagliasacchi,et al. SoundStream: An End-to-End Neural Audio Codec , 2022, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[16] Philippe Esling,et al. RAVE: A variational autoencoder for fast and high-quality neural audio synthesis , 2021, ArXiv.
[17] Curtis Hawthorne,et al. Sequence-to-Sequence Piano Transcription with Transformers , 2021, ISMIR.
[18] Diederik P. Kingma,et al. Variational Diffusion Models , 2021, ArXiv.
[19] Gaetan Hadjeres,et al. CRASH: Raw Audio Score-based Generative Modeling for Controllable High-resolution Drum Sound Synthesis , 2021, ISMIR.
[20] Tamar Rott Shaham,et al. Catch-A-Waveform: Learning to Generate Audio from a Single Short Example , 2021, NeurIPS.
[21] Songxiang Liu,et al. DiffSVC: A Diffusion Probabilistic Model for Singing Voice Conversion , 2021, 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[22] Tasnima Sadekova,et al. Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech , 2021, ICML.
[23] J. Yamagishi,et al. Text-to-Speech Synthesis Techniques for MIDI-to-Audio Synthesis , 2021, 11th ISCA Speech Synthesis Workshop (SSW 11).
[24] Junhyeok Lee,et al. NU-Wave: A Diffusion Probabilistic Model for Neural Audio Upsampling , 2021, Interspeech 2021.
[25] Nam Soo Kim,et al. Diff-TTS: A Denoising Diffusion Model for Text-to-Speech , 2021, Interspeech.
[26] Curtis Hawthorne,et al. Symbolic Music Generation with Diffusion Models , 2021, ISMIR.
[27] Prafulla Dhariwal,et al. Improved Denoising Diffusion Probabilistic Models , 2021, ICML.
[28] Ron J. Weiss,et al. Wave-Tacotron: Spectrogram-Free End-to-End Text-to-Speech Synthesis , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[29] Bryan Catanzaro,et al. DiffWave: A Versatile Diffusion Model for Audio Synthesis , 2020, ICLR.
[30] Heiga Zen,et al. WaveGrad: Estimating Gradients for Waveform Generation , 2020, ICLR.
[31] Zhou Zhao,et al. DiffSinger: Diffusion Acoustic Model for Singing Voice Synthesis , 2021, ArXiv.
[32] Kou Tanaka,et al. VoiceGrad: Non-Parallel Any-to-Many Voice Conversion With Annealed Langevin Dynamics , 2020, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[33] Dominik Roblek,et al. SEANet: A Multi-modal Speech Enhancement Network , 2020, INTERSPEECH.
[34] Jasper Snoek,et al. A Spectral Energy Distance for Parallel Speech Synthesis , 2020, NeurIPS.
[35] Pieter Abbeel,et al. Denoising Diffusion Probabilistic Models , 2020, NeurIPS.
[36] Curtis Hawthorne,et al. Self-supervised Pitch Detection by Inverse Audio Synthesis , 2020 .
[37] Ilya Sutskever,et al. Jukebox: A Generative Model for Music , 2020, ArXiv.
[38] Aren Jansen,et al. Towards Learning a Universal Non-Semantic Representation of Speech , 2020, INTERSPEECH.
[39] Chenjie Gu,et al. DDSP: Differentiable Digital Signal Processing , 2020, ICLR.
[40] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[41] Prem Seetharaman,et al. Simultaneous Separation and Transcription of Mixtures with Multiple Polyphonic and Percussive Instruments , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[42] Aäron van den Oord,et al. Towards realistic MIDI instrument synthesizers , 2020 .
[43] Yoshua Bengio,et al. MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis , 2019, NeurIPS.
[44] Jonathan Le Roux,et al. Cutting Music Source Separation Some Slakh: A Dataset to Study the Impact of Training Data Quality and Quantity , 2019, 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).
[45] Dominik Roblek,et al. Fréchet Audio Distance: A Reference-Free Metric for Evaluating Music Enhancement Algorithms , 2019, INTERSPEECH.
[46] Kumar Krishna Agrawal,et al. GANSynth: Adversarial Neural Audio Synthesis , 2019, ICLR.
[47] Jong Wook Kim,et al. Neural Music Synthesis for Flexible Timbre Control , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[48] Ryan Prenger,et al. Waveglow: A Flow-based Generative Network for Speech Synthesis , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[49] Douglas Eck,et al. Enabling Factorized Piano Music Modeling and Generation with the MAESTRO Dataset , 2018, ICLR.
[50] Lior Wolf,et al. A Universal Music Translation Network , 2018, ICLR.
[51] Chris Donahue,et al. Adversarial Audio Synthesis , 2018, ICLR.
[52] Gaurav Sharma,et al. Creating a Multitrack Classical Music Performance Dataset for Multimodal Music Analysis: Challenges, Insights, and Applications , 2016, IEEE Transactions on Multimedia.
[53] Chenjie Gu,et al. Fast and Flexible Neural Audio Synthesis , 2019, ISMIR.
[54] Nicolas Usunier,et al. SING: Symbol-to-Instrument Neural Generator , 2018, NeurIPS.
[55] Brian Kulis,et al. Conditioning Deep Generative Raw Audio Models for Structured Automatic Music , 2018, ISMIR.
[56] Noam Shazeer,et al. Adafactor: Adaptive Learning Rates with Sublinear Memory Cost , 2018, ICML.
[57] Erich Elsen,et al. Efficient Neural Audio Synthesis , 2018, ICML.
[58] Navdeep Jaitly,et al. Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[59] Aaron C. Courville,et al. FiLM: Visual Reasoning with a General Conditioning Layer , 2017, AAAI.
[60] Johan Pauwels,et al. GuitarSet: A Dataset for Guitar Transcription , 2018, ISMIR.
[61] Oriol Vinyals,et al. Neural Discrete Representation Learning , 2017, NIPS.
[62] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[63] Karen Simonyan,et al. Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders , 2017, ICML.
[64] Samy Bengio,et al. Tacotron: Towards End-to-End Speech Synthesis , 2017, INTERSPEECH.
[65] Yoshua Bengio,et al. SampleRNN: An Unconditional End-to-End Neural Audio Generation Model , 2016, ICLR.
[66] Zaïd Harchaoui,et al. Learning Features of Music from Scratch , 2016, ICLR.
[67] Aren Jansen,et al. CNN architectures for large-scale audio classification , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[68] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[69] Sepp Hochreiter,et al. Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.
[70] Thomas Brox,et al. U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.
[71] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[72] Daniel P. W. Ellis,et al. MIR_EVAL: A Transparent Implementation of Common MIR Metrics , 2014, ISMIR.