BEATs: Audio Pre-Training with Acoustic Tokenizers
暂无分享,去创建一个
[1] James R. Glass,et al. Contrastive Audio-Visual Masked Autoencoder , 2022, ICLR.
[2] Li Dong,et al. Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks , 2022, ArXiv.
[3] Li Dong,et al. BEiT v2: Masked Image Modeling with Vector-Quantized Visual Tokenizers , 2022, ArXiv.
[4] Michael Auli,et al. Masked Autoencoders that Listen , 2022, NeurIPS.
[5] Benjamin Elizalde,et al. CLAP: Learning Audio Concepts From Natural Language Supervision , 2022, ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[6] Dading Chong,et al. Masked Spectrogram Prediction for Self-Supervised Audio Pre-Training , 2022, IEEE International Conference on Acoustics, Speech, and Signal Processing.
[7] K. Kashino,et al. Masked Spectrogram Modeling using Masked Autoencoders for Learning General-purpose Audio Representation , 2022, HEAR@NeurIPS.
[8] David F. Harwath,et al. MAE-AST: Masked Autoencoding Audio Spectrogram Transformer , 2022, INTERSPEECH.
[9] Li Dong,et al. DeepNet: Scaling Transformers to 1, 000 Layers , 2022, ArXiv.
[10] Michael Auli,et al. data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language , 2022, ICML.
[11] Yonghui Wu,et al. Self-supervised Learning with Random-projection Quantizer for Speech Recognition , 2022, ICML.
[12] S. Dubnov,et al. HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection , 2022, ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[13] Ross B. Girshick,et al. Masked Autoencoders Are Scalable Vision Learners , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[14] Jinyu Li,et al. WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing , 2021, IEEE Journal of Selected Topics in Signal Processing.
[15] J. Bello,et al. Wav2CLIP: Learning Robust Audio Representations from Clip , 2021, ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[16] James R. Glass,et al. SSAST: Self-Supervised Audio Spectrogram Transformer , 2021, AAAI.
[17] Jan Schlüter,et al. Efficient Training of Audio Transformers with Patchout , 2021, INTERSPEECH.
[18] Jing Yu Koh,et al. Vector-quantized Image Modeling with Improved VQGAN , 2021, ICLR.
[19] Tara N. Sainath,et al. BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition , 2021, IEEE Journal of Selected Topics in Signal Processing.
[20] Li Dong,et al. XLM-E: Cross-lingual Language Model Pre-training via ELECTRA , 2021, ACL.
[21] C. Schmid,et al. Attention Bottlenecks for Multimodal Fusion , 2021, NeurIPS.
[22] Federico Raue,et al. Audioclip: Extending Clip to Image, Text and Audio , 2021, ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[23] Li Dong,et al. BEiT: BERT Pre-Training of Image Transformers , 2021, ICLR.
[24] Ruslan Salakhutdinov,et al. HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units , 2021, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[25] S. Verbitskiy,et al. ERANNs: Efficient Residual Audio Neural Networks for Audio Pattern Recognition , 2021, Pattern Recognit. Lett..
[26] Andy T. Liu,et al. SUPERB: Speech processing Universal PERformance Benchmark , 2021, Interspeech.
[27] James R. Glass,et al. AST: Audio Spectrogram Transformer , 2021, Interspeech.
[28] Andrew N. Carr,et al. Self-Supervised Learning of Audio Representations From Permutations With Differentiable Ranking , 2021, IEEE Signal Processing Letters.
[29] K. Kashino,et al. BYOL for Audio: Self-Supervised Learning for General-Purpose Audio Representation , 2021, IEEE International Joint Conference on Neural Network.
[30] Aäron van den Oord,et al. Multi-Format Contrastive Learning of Audio Representations , 2021, ArXiv.
[31] Ilya Sutskever,et al. Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.
[32] Alec Radford,et al. Zero-Shot Text-to-Image Generation , 2021, ICML.
[33] James R. Glass,et al. PSLA: Improving Audio Tagging With Pretraining, Sampling, Labeling, and Aggregation , 2021, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[34] Chandan K. A. Reddy,et al. Interspeech 2021 Deep Noise Suppression Challenge , 2021, Interspeech.
[35] Noel E. O'Connor,et al. Unsupervised Contrastive Learning of Sound Event Representations , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[36] S. Gelly,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.
[37] Neil Zeghidour,et al. Contrastive Learning of General-Purpose Audio Representations , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[38] Yalda Mohsenzadeh,et al. CLAR: Contrastive Learning of Auditory Representations , 2020, AISTATS.
[39] Abdel-rahman Mohamed,et al. wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations , 2020, NeurIPS.
[40] Pierre H. Richemond,et al. Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning , 2020, NeurIPS.
[41] Marco Tagliasacchi,et al. Pre-Training Audio Representations With Self-Supervision , 2020, IEEE Signal Processing Letters.
[42] Mark D. Plumbley,et al. PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[43] Kevin Gimpel,et al. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.
[44] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[45] Quoc V. Le,et al. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.
[46] Quoc V. Le,et al. SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition , 2019, INTERSPEECH.
[47] Ronan Collobert,et al. wav2vec: Unsupervised Pre-training for Speech Recognition , 2019, INTERSPEECH.
[48] Yoshua Bengio,et al. Learning Speaker Representations with Mutual Information , 2018, INTERSPEECH.
[49] Karen Simonyan,et al. The challenge of realistic music generation: modelling raw audio at scale , 2018, NeurIPS.
[50] Pete Warden,et al. Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition , 2018, ArXiv.
[51] Frank Hutter,et al. Decoupled Weight Decay Regularization , 2017, ICLR.
[52] Oriol Vinyals,et al. Neural Discrete Representation Learning , 2017, NIPS.
[53] Hongyi Zhang,et al. mixup: Beyond Empirical Risk Minimization , 2017, ICLR.
[54] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[55] Aren Jansen,et al. Audio Set: An ontology and human-labeled dataset for audio events , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[56] Karol J. Piczak. ESC: Dataset for Environmental Sound Classification , 2015, ACM Multimedia.
[57] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[58] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.
[59] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.
[60] Carlos Busso,et al. IEMOCAP: interactive emotional dyadic motion capture database , 2008, Lang. Resour. Evaluation.
[61] Yoshua Bengio,et al. Convolutional networks for images, speech, and time series , 1998 .
[62] Y. Mohsenzadeh,et al. Contrastive Learning of Auditory Representations , 2021 .
[63] Stephen Lin,et al. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[64] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[65] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[66] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .