HEAR: Holistic Evaluation of Audio Representations
暂无分享,去创建一个
Joseph P. Turian | J. Salamon | Jesse Engel | Yonatan Bisk | Björn Schuller | Shinji Watanabe | B. Raj | Zeyu Jin | Gissel Velarde | G. Tzanetakis | Dorien Herremans | P. Esling | Eduardo Fonseca | C. Steinmetz | Pranay Manocha | Jordie Shier | H. Khan | C. Malloy | K. McNally | Max Henry | Nicolas Pinto | Camille Noufi | Christian Clough
[1] Hung-yi Lee,et al. The Ability of Self-Supervised Speech Models for Audio Representations , 2022, 2209.12900.
[2] Karl El Hajal,et al. BYOL-S: Learning Self-supervised Speech Representations by Bootstrapping , 2022, ArXiv.
[3] Mashrur M. Morshed,et al. Learning Audio Representations with MLPs , 2022, ArXiv.
[4] Priya Goyal,et al. Vision Models Are More Robust And Fair When Pretrained On Uncurated Images Without Supervision , 2022, ArXiv.
[5] Aäron van den Oord,et al. Towards Learning Universal Audio Representations , 2021, ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[6] J. Bello,et al. Wav2CLIP: Learning Robust Audio Representations from Clip , 2021, ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[7] Efficient Training of Audio Transformers with Patchout , 2021, ArXiv.
[8] Yann LeCun,et al. VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning , 2021, ICLR.
[9] X. Serra,et al. FSD50K: An Open Dataset of Human-Labeled Sound Events , 2020, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[10] P. Biecek,et al. Interpretable meta-score for model performance , 2020, Nature Machine Intelligence.
[11] Andres Ferraro,et al. Improving Sound Event Classification by Increasing Shift Invariance in Convolutional Neural Networks , 2021, ArXiv.
[12] Ruslan Salakhutdinov,et al. HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units , 2021, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[13] Quoc V. Le,et al. Pay Attention to MLPs , 2021, NeurIPS.
[14] Andy T. Liu,et al. SUPERB: Speech processing Universal PERformance Benchmark , 2021, Interspeech.
[15] Aäron van den Oord,et al. Multimodal Self-Supervised Learning of General Audio Representations , 2021, ArXiv.
[16] Emmanouil Benetos,et al. Revisiting the Onsets and Frames Model with Additive Attention , 2021, 2021 International Joint Conference on Neural Networks (IJCNN).
[17] James R. Glass,et al. AST: Audio Spectrogram Transformer , 2021, Interspeech.
[18] Andrew N. Carr,et al. Self-Supervised Learning of Audio Representations From Permutations With Differentiable Ranking , 2021, IEEE Signal Processing Letters.
[19] K. Kashino,et al. BYOL for Audio: Self-Supervised Learning for General-Purpose Audio Representation , 2021, IEEE International Joint Conference on Neural Network.
[20] Aäron van den Oord,et al. Multi-Format Contrastive Learning of Audio Representations , 2021, ArXiv.
[21] Yann LeCun,et al. Barlow Twins: Self-Supervised Learning via Redundancy Reduction , 2021, ICML.
[22] Ilya Sutskever,et al. Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.
[23] James R. Glass,et al. PSLA: Improving Audio Tagging With Pretraining, Sampling, Labeling, and Aggregation , 2021, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[24] Emmanuel Dupoux,et al. On Generative Spoken Language Modeling from Raw Audio , 2021, Transactions of the Association for Computational Linguistics.
[25] Marco Tagliasacchi,et al. LEAF: A Learnable Frontend for Audio Classification , 2021, ICLR.
[26] Emmanuel Dupoux,et al. VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation , 2021, ACL.
[27] Jorgen Valk,et al. VOXLINGUA107: A Dataset for Spoken Language Recognition , 2020, 2021 IEEE Spoken Language Technology Workshop (SLT).
[28] Xinlei Chen,et al. Exploring Simple Siamese Representation Learning , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[29] Noel E. O'Connor,et al. Unsupervised Contrastive Learning of Sound Event Representations , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[30] S. Gelly,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.
[31] Neil Zeghidour,et al. Contrastive Learning of General-Purpose Audio Representations , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[32] Abdel-rahman Mohamed,et al. wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations , 2020, NeurIPS.
[33] Pierre H. Richemond,et al. Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning , 2020, NeurIPS.
[34] Chen Sun,et al. What makes for good views for contrastive learning , 2020, NeurIPS.
[35] Andrew Zisserman,et al. Vggsound: A Large-Scale Audio-Visual Dataset , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[36] Aren Jansen,et al. Towards Learning a Universal Non-Semantic Representation of Speech , 2020, INTERSPEECH.
[37] Geoffrey E. Hinton,et al. A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.
[38] Chenjie Gu,et al. DDSP: Differentiable Digital Signal Processing , 2020, ICLR.
[39] Mark D. Plumbley,et al. PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[40] Byron C. Wallace,et al. ERASER: A Benchmark to Evaluate Rationalized NLP Models , 2019, ACL.
[41] André Susano Pinto,et al. A Large-scale Study of Representation Learning with the Visual Task Adaptation Benchmark , 2019, 1910.04867.
[42] Quoc V. Le,et al. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.
[43] Justin Salamon,et al. Look, Listen, and Learn More: Design Choices for Deep Audio Embeddings , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[44] Abhinav Gupta,et al. Scaling and Benchmarking Self-Supervised Visual Representation Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[45] Omer Levy,et al. SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems , 2019, NeurIPS.
[46] Ronan Collobert,et al. wav2vec: Unsupervised Pre-training for Speech Recognition , 2019, INTERSPEECH.
[47] Yoshua Bengio,et al. Learning Problem-agnostic Speech Representations from Multiple Self-supervised Tasks , 2019, INTERSPEECH.
[48] Simone Orcioni,et al. Audio-based Identification of Beehive States , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[49] Douglas Eck,et al. Enabling Factorized Piano Music Modeling and Generation with the MAESTRO Dataset , 2018, ICLR.
[50] Xavier Serra,et al. Randomly Weighted CNNs for (Music) Audio Classification , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[51] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[52] Benjamin Recht,et al. Do CIFAR-10 Classifiers Generalize to CIFAR-10? , 2018, ArXiv.
[53] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[54] Pete Warden,et al. Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition , 2018, ArXiv.
[55] Erich Elsen,et al. Efficient Neural Audio Synthesis , 2018, ICML.
[56] Jong Wook Kim,et al. Crepe: A Convolutional Representation for Pitch Estimation , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[57] Mathieu Lagrange,et al. Detection and Classification of Acoustic Scenes and Events: Outcome of the DCASE 2016 Challenge , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[58] Emanuel A. P. Habets,et al. Classification vs. Regression in Supervised Learning for Single Channel Speaker Count Estimation , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[59] Colin Raffel,et al. Onsets and Frames: Dual-Objective Piano Transcription , 2017, ISMIR.
[60] Bryan Pardo,et al. Vocal Imitation Set: a dataset of vocally imitated sound events using the AudioSet ontology , 2018, DCASE.
[61] Björn W. Schuller,et al. Snore Sound Classification Using Image-Based Deep Spectrum Features , 2017, INTERSPEECH.
[62] Andrew Zisserman,et al. Look, Listen and Learn , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[63] Bo Chen,et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.
[64] Karen Simonyan,et al. Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders , 2017, ICML.
[65] Aren Jansen,et al. Audio Set: An ontology and human-labeled dataset for audio events , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[66] Yoshua Bengio,et al. SampleRNN: An Unconditional End-to-End Neural Audio Generation Model , 2016, ICLR.
[67] Aren Jansen,et al. CNN architectures for large-scale audio classification , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[68] Justin Salamon,et al. Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification , 2016, IEEE Signal Processing Letters.
[69] M. Picheny,et al. Comparison of Parametric Representation for Monosyllabic Word Recognition in Continuously Spoken Sentences , 2017 .
[70] Gerhard Widmer,et al. On the Potential of Simple Framewise Approaches to Piano Transcription , 2016, ISMIR.
[71] Antonio Torralba,et al. SoundNet: Learning Sound Representations from Unlabeled Video , 2016, NIPS.
[72] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[73] Annamaria Mesaros,et al. Metrics for Polyphonic Sound Event Detection , 2016 .
[74] Paolo Favaro,et al. Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles , 2016, ECCV.
[75] George Trigeorgis,et al. Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[76] Karol J. Piczak. ESC: Dataset for Environmental Sound Classification , 2015, ACM Multimedia.
[77] Sanjeev Khudanpur,et al. Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[78] Ragini Verma,et al. CREMA-D: Crowd-Sourced Emotional Multimodal Actors Dataset , 2014, IEEE Transactions on Affective Computing.
[79] Ajay Srinivasamurthy,et al. A study of instrument-wise onset detection in Beijing Opera percussion ensembles , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[80] Joakim Andén,et al. Deep Scattering Spectrum , 2013, IEEE Transactions on Signal Processing.
[81] J. Stephen Downie,et al. Ten years of MIREX: reflections, challenges and opportunities , 2014, ISMIR 2014.
[82] Bob L. Sturm. The GTZAN dataset: Its contents, its faults, their effects on evaluation, and its future use , 2013, ArXiv.
[83] Hema A. Murthy,et al. Modal analysis and transcription of strokes of the mridangam using non-negative matrix factorization , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[84] Zhenghao Chen,et al. On Random Weights and Unsupervised Feature Learning , 2011, ICML.
[85] Jason Weston,et al. Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..
[86] Yoshua Bengio,et al. Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.
[87] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.
[88] Christian Schörkhuber. CONSTANT-Q TRANSFORM TOOLBOX FOR MUSIC PROCESSING , 2010 .
[89] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.
[90] Yann LeCun,et al. Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).
[91] George Tzanetakis,et al. Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..
[92] Yoshua Bengio,et al. A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..
[93] Beth Logan,et al. Mel Frequency Cepstral Coefficients for Music Modeling , 2000, ISMIR.