暂无分享,去创建一个
[1] Andrew Zisserman,et al. Self-Supervised MultiModal Versatile Networks , 2020, NeurIPS.
[2] Andreas Dengel,et al. ESResNet: Environmental Sound Classification Based on Visual Domain Models , 2020, 2020 25th International Conference on Pattern Recognition (ICPR).
[3] Aren Jansen,et al. Audio Set: An ontology and human-labeled dataset for audio events , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[4] Heng Tao Shen,et al. Enhancing Audio-Visual Association with Self-Supervised Curriculum Learning , 2021, AAAI.
[5] Shih-Fu Chang,et al. VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text , 2021, NeurIPS.
[6] Karol J. Piczak. ESC: Dataset for Environmental Sound Classification , 2015, ACM Multimedia.
[7] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[8] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[9] Tuomas Virtanen,et al. Zero-Shot Audio Classification Via Semantic Embeddings , 2020, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[10] Aäron van den Oord,et al. Multimodal Self-Supervised Learning of General Audio Representations , 2021, ArXiv.
[11] Tatsuya Harada,et al. Learning from Between-class Examples for Deep Sound Recognition , 2017, ICLR.
[12] François Chollet,et al. Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[13] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[14] Ilya Sutskever,et al. Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.
[15] Kamalesh Palanisamy,et al. Rethinking CNN Models for Audio Classification , 2020, ArXiv.
[16] Georg Heigold,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2021, ICLR.
[17] Tatsuya Harada,et al. Learning environmental sounds with end-to-end convolutional neural network , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[18] Karol J. Piczak. Environmental sound classification with convolutional neural networks , 2015, 2015 IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP).
[19] Sergey Verbitskiy,et al. ERANNs: Efficient Residual Audio Neural Networks for Audio Pattern Recognition , 2021, ArXiv.
[20] Andreas Dengel,et al. ESResNe(X)t-fbsp: Learning Robust Time-Frequency Transformation of Audio , 2021, 2021 International Joint Conference on Neural Networks (IJCNN).
[21] Justin Salamon,et al. A Dataset and Taxonomy for Urban Sound Research , 2014, ACM Multimedia.
[22] Justin Salamon,et al. Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification , 2016, IEEE Signal Processing Letters.
[23] Aleksandr Petiushko,et al. MDMMT: Multidomain Multimodal Transformer for Video Retrieval , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[24] Gunhee Kim,et al. AudioCaps: Generating Captions for Audios in The Wild , 2019, NAACL.
[25] Boris Polyak,et al. Acceleration of stochastic approximation by averaging , 1992 .
[26] Anurag Kumar,et al. A Sequential Self Teaching Approach for Improving Generalization in Sound Event Recognition , 2020, ICML.
[27] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.
[28] Shahriar Nirjon,et al. SoundSemantics: Exploiting Semantic Knowledge in Text for Embedded Acoustic Event Classification , 2019, 2019 18th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN).
[29] Hemant A. Patil,et al. Unsupervised Filterbank Learning Using Convolutional Restricted Boltzmann Machine for Environmental Sound Classification , 2017, INTERSPEECH.
[30] Daniel L. Rubin,et al. Differential Data Augmentation Techniques for Medical Imaging Classification Tasks , 2017, AMIA.
[31] Tuomas Virtanen,et al. Zero-Shot Audio Classification Based On Class Label Embeddings , 2019, 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).
[32] James Glass,et al. AST: Audio Spectrogram Transformer , 2021, Interspeech 2021.