论文信息 - Multi-label Few-shot Learning for Sound Event Recognition

Multi-label Few-shot Learning for Sound Event Recognition

Few-shot classification aims to generalize the concept from seen classes to unseen novel classes using only a few examples. Although significant progress in few-shot classification has been made, most approaches focus on a standard multi-class scenario and are based on learning single-label embedding of the labeled examples to classify the unlabeled examples. Besides, we note that state-of-the-art methods in few-shot learning mostly adopt a metric-based architecture and the the so-called episode training strategy. While this approach works nicely for multiclass classification, it is hard to apply it to the multi-label scenario because of the complexity of forming an episode. In this paper, we propose a One-vs.-Rest episode selection strategy to mitigate this issue and apply the strategy to the multi-label few-shot problem. Experiments conducted using the large-scale data found in the AudioSet show that the models with our training strategy extract the semantic features under the multi-label setting.

Yi-Hsuan Yang | Szu-Yu Chou | Kai-Hsiang Cheng

[1] Toan H. Vu,et al. DEEP LEARNING FOR DCASE 2017 CHALLENGE , 2017 .

[2] Gregory R. Koch,et al. Siamese Neural Networks for One-Shot Image Recognition , 2015 .

[3] Ramakanth Kavuluru,et al. Few-Shot and Zero-Shot Multi-Label Learning for Structured Label Spaces , 2018, EMNLP.

[4] Yi-Hsuan Yang,et al. Learning to Match Transient Sound Events Using Attentional Similarity for Few-shot Sound Recognition , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5] Yi-Hsuan Yang,et al. Weakly-supervised audio event detection using event-specific Gaussian filters and fully convolutional networks , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6] Tao Xiang,et al. Learning to Compare: Relation Network for Few-Shot Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7] Yi-Hsuan Yang,et al. Event Localization in Music Auto-tagging , 2016, ACM Multimedia.

[8] Kyogu Lee,et al. Ensemble of Convolutional Neural Networks for Weakly-supervised Sound Event Detection Using Multiple Scale Input , 2017, DCASE.

[9] Aren Jansen,et al. Audio Set: An ontology and human-labeled dataset for audio events , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10] Min-Ling Zhang,et al. A Review on Multi-Label Learning Algorithms , 2014, IEEE Transactions on Knowledge and Data Engineering.

[11] M. Aly. Survey on Multiclass Classification Methods , 2005 .

[12] Ivor W. Tsang,et al. Survey on Multi-Output Learning , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[13] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[14] Aren Jansen,et al. CNN architectures for large-scale audio classification , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15] Rogério Schmidt Feris,et al. LaSO: Label-Set Operations Networks for Multi-Label Few-Shot Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16] Ankit Shah,et al. DCASE2017 Challenge Setup: Tasks, Datasets and Baseline System , 2017, DCASE.

[17] Oriol Vinyals,et al. Matching Networks for One Shot Learning , 2016, NIPS.

[18] Richard S. Zemel,et al. Prototypical Networks for Few-shot Learning , 2017, NIPS.

[19] Tuomas Virtanen,et al. TUT database for acoustic scene classification and sound event detection , 2016, 2016 24th European Signal Processing Conference (EUSIPCO).

[20] Colin Raffel,et al. librosa: Audio and Music Signal Analysis in Python , 2015, SciPy.