AudioLM: A Language Modeling Approach to Audio Generation
暂无分享,去创建一个
David Grangier | Damien Vincent | O. Pietquin | Dominik Roblek | E. Kharitonov | O. Teboul | Neil Zeghidour | Matthew Sharifi | Raphaël Marinier | M. Tagliasacchi | Zalán Borsos
[1] Jing Yu Koh,et al. Scaling Autoregressive Models for Content-Rich Text-to-Image Generation , 2022, Trans. Mach. Learn. Res..
[2] Devi Parikh,et al. Long Video Generation with Time-Agnostic VQGAN and Time-Sensitive Transformer , 2022, ECCV.
[3] Andrew M. Dai,et al. PaLM: Scaling Language Modeling with Pathways , 2022, J. Mach. Learn. Res..
[4] Marc van Zee,et al. Scaling Up Models and Data with t5x and seqio , 2022, ArXiv.
[5] Patrick von Platen,et al. XTREME-S: Evaluating Cross-lingual Speech Representations , 2022, INTERSPEECH.
[6] Benoît Sagot,et al. Are Discrete Units Necessary for Spoken Language Modeling? , 2022, IEEE Journal of Selected Topics in Signal Processing.
[7] Abdel-rahman Mohamed,et al. textless-lib: a Library for Textless Spoken Language Processing , 2022, NAACL.
[8] Oriol Vinyals,et al. General-purpose, long-context autoregressive modeling with Perceiver AR , 2022, ICML.
[9] David F. Harwath,et al. Self-Supervised Representation Learning for Speech Using Visual Grounding and Masked Language Modeling , 2022, ArXiv.
[10] Renelito Delos Santos,et al. LaMDA: Language Models for Dialog Applications , 2022, ArXiv.
[11] YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone , 2021, ArXiv.
[12] Jing Yu Koh,et al. Vector-quantized Image Modeling with Improved VQGAN , 2021, ICLR.
[13] Abdel-rahman Mohamed,et al. Text-Free Prosody-Aware Generative Spoken Language Modeling , 2021, ACL.
[14] Marco Tagliasacchi,et al. SoundStream: An End-to-End Neural Audio Codec , 2022, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[15] Po-Sen Huang,et al. Scaling Language Models: Methods, Analysis & Insights from Training Gopher , 2021, ArXiv.
[16] Junichi Yamagishi,et al. ASVspoof 2021: Automatic Speaker Verification Spoofing and Countermeasures Challenge Evaluation Plan , 2021, ArXiv.
[17] Chung-Cheng Chiu,et al. w2v-BERT: Combining Contrastive Learning and Masked Language Modeling for Self-Supervised Speech Pre-Training , 2021, 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[18] Benjamin van Niekerk,et al. Analyzing Speaker Information in Self-Supervised Models to Improve Zero-Resource Speech Processing , 2021, Interspeech.
[19] Minje Kim,et al. Harp-Net: Hyper-Autoencoded Reconstruction Propagation for Scalable Neural Audio Coding , 2021, 2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).
[20] Karen Livescu,et al. Layer-wise Analysis of a Self-supervised Speech Representation Model , 2021, ArXiv.
[21] Wojciech Zaremba,et al. Evaluating Large Language Models Trained on Code , 2021, ArXiv.
[22] Ruslan Salakhutdinov,et al. HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units , 2021, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[23] Ruslan Salakhutdinov,et al. Hubert: How Much Can a Bad Teacher Benefit ASR Pre-Training? , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[24] Eugene Kharitonov,et al. The Zero Resource Speech Challenge 2021: Spoken Language Modelling , 2021, Interspeech.
[25] Emmanuel Dupoux,et al. On Generative Spoken Language Modeling from Raw Audio , 2021, Transactions of the Association for Computational Linguistics.
[26] Emmanuel Dupoux,et al. Towards Unsupervised Learning of Speech Features in the Wild , 2021, 2021 IEEE Spoken Language Technology Workshop (SLT).
[27] Rethinking Attention with Performers , 2020, ICLR.
[28] Bryan Catanzaro,et al. DiffWave: A Versatile Diffusion Model for Audio Synthesis , 2020, ICLR.
[29] Heiga Zen,et al. WaveGrad: Estimating Gradients for Waveform Generation , 2020, ICLR.
[30] Matthijs Douze,et al. Data Augmenting Contrastive Learning of Speech Representations in the Time Domain , 2020, 2021 IEEE Spoken Language Technology Workshop (SLT).
[31] Aurko Roy,et al. Efficient Content-Based Sparse Attention with Routing Transformers , 2020, TACL.
[32] Jaehyeon Kim,et al. HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis , 2020, NeurIPS.
[33] Dominik Roblek,et al. SEANet: A Multi-modal Speech Enhancement Network , 2020, INTERSPEECH.
[34] Abdel-rahman Mohamed,et al. wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations , 2020, NeurIPS.
[35] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[36] Yu Zhang,et al. Conformer: Convolution-augmented Transformer for Speech Recognition , 2020, INTERSPEECH.
[37] Ilya Sutskever,et al. Jukebox: A Generative Model for Music , 2020, ArXiv.
[38] Andrew Hines,et al. ViSQOL v3: An Open Source Production Ready Objective Speech and Audio Metric , 2020, 2020 Twelfth International Conference on Quality of Multimedia Experience (QoMEX).
[39] Abdel-rahman Mohamed,et al. Libri-Light: A Benchmark for ASR with Limited or No Supervision , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[40] Guillaume Lample,et al. Deep Learning for Symbolic Mathematics , 2019, ICLR.
[41] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[42] Michael Auli,et al. vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations , 2019, ICLR.
[43] Erich Elsen,et al. High Fidelity Speech Synthesis with Adversarial Networks , 2019, ICLR.
[44] Yoshua Bengio,et al. MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis , 2019, NeurIPS.
[45] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[46] Minje Kim,et al. Cascaded Cross-Module Residual Learning towards Lightweight End-to-End Speech Coding , 2019, INTERSPEECH.
[47] Ali Razavi,et al. Generating Diverse High-Fidelity Images with VQ-VAE-2 , 2019, NeurIPS.
[48] Marco Tagliasacchi,et al. Self-supervised audio representation learning for mobile devices , 2019, ArXiv.
[49] Douglas Eck,et al. Enabling Factorized Piano Music Modeling and Generation with the MAESTRO Dataset , 2018, ICLR.
[50] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[51] Oriol Vinyals,et al. Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.
[52] Erich Elsen,et al. Efficient Neural Audio Synthesis , 2018, ICML.
[53] Heiga Zen,et al. Parallel WaveNet: Fast High-Fidelity Speech Synthesis , 2017, ICML.
[54] Srihari Kankanahalli,et al. End-To-End Optimized Speech Coding with Deep Neural Networks , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[55] Oriol Vinyals,et al. Neural Discrete Representation Learning , 2017, NIPS.
[56] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[57] Luca Benini,et al. Soft-to-Hard Vector Quantization for End-to-End Learned Compression of Images and Neural Networks , 2017, ArXiv.
[58] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[59] Thomas Schatz. ABX-Discriminability Measures and Applications. (Mesures de discriminabilité ABX et applications) , 2016 .
[60] Anil C. Kokaram,et al. ViSQOL: an objective speech quality model , 2015, EURASIP J. Audio Speech Music. Process..
[61] Sanjeev Khudanpur,et al. Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[62] Aren Jansen,et al. Evaluating speech features with the minimal-pair ABX task: analysis of the classical MFC/PLP pipeline , 2013, INTERSPEECH.