Non-Negative Matrix Factorization-Convolutional Neural Network (NMF-CNN) For Sound Event Detection

The main scientific question of this year DCASE challenge, Task 4 - Sound Event Detection in Domestic Environments, is to investigate the types of data (strongly labeled synthetic data, weakly labeled data, unlabeled in domain data) required to achieve the best performing system. In this paper, we proposed a deep learning model that integrates Non-Negative Matrix Factorization (NMF) with Convolutional Neural Network (CNN). The key idea of such integration is to use NMF to provide an approximate strong label to the weakly labeled data. Such integration was able to achieve a higher event-based F1-score as compared to the baseline system (Evaluation Dataset: 30.39% vs. 23.7%, Validation Dataset: 31% vs. 25.8%). By comparing the validation results with other participants, the proposed system was ranked 8th among 19 teams (inclusive of the baseline system) in this year Task 4 challenge.

[1]  Anthony J. Agnone,et al.  VIRTUAL ADVERSARIAL TRAINING SYSTEM FOR DCASE 2019 TASK 4 Technical Report , 2019 .

[2]  Xiangdong Wang,et al.  Guided Learning Convolution System for DCASE 2019 Task 4 , 2019, DCASE.

[3]  Jiwen Lu,et al.  PCANet: A Simple Deep Learning Baseline for Image Classification? , 2014, IEEE Transactions on Image Processing.

[4]  Chi-Man Vong,et al.  Capturing High-Discriminative Fault Features for Electronics-Rich Analog System via Deep Learning , 2017, IEEE Transactions on Industrial Informatics.

[5]  Yong Xu,et al.  Cross-task learning for audio tagging, sound event detection and spatial localization: DCASE 2019 baseline systems , 2019, ArXiv.

[6]  Tomoki Toda,et al.  Duration-Controlled LSTM for Polyphonic Sound Event Detection , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[7]  Justin Salamon,et al.  Adaptive Pooling Operators for Weakly Labeled Sound Event Detection , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[8]  Huibin Lin,et al.  HODGEPODGE: Sound Event Detection Based on Ensemble of Semi-Supervised Learning Methods , 2019, DCASE.

[9]  Tuomas Virtanen,et al.  Sound Event Detection in Multichannel Audio Using Spatial and Harmonic Features , 2017, DCASE.

[10]  Tuomas Virtanen,et al.  Sound event detection using spatial features and convolutional recurrent neural network , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  Yong Xu,et al.  A Joint Separation-Classification Model for Sound Event Detection of Weakly Labelled Data , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]  Miao He,et al.  Deep Learning Based Approach for Bearing Fault Diagnosis , 2017, IEEE Transactions on Industry Applications.

[13]  Nei Kato,et al.  State-of-the-Art Deep Learning: Evolving Machine Intelligence Toward Tomorrow’s Intelligent Network Traffic Control Systems , 2017, IEEE Communications Surveys & Tutorials.

[14]  Lu Jiakai,et al.  MEAN TEACHER CONVOLUTION SYSTEM FOR DCASE 2018 TASK 4 , 2018 .

[15]  Reishi Kondo,et al.  Acoustic event detection based on non-negative matrix factorization with mixtures of local dictionaries and activation aggregation , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[16]  Yong Xu,et al.  Sound Event Detection and Time–Frequency Segmentation from Weakly Labelled Data , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[17]  Heikki Huttunen,et al.  Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[18]  Xihong Wu,et al.  GMM-HMM acoustic model training by a two level procedure with Gaussian components determined by automatic model selection , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[19]  Onur Dikmen,et al.  Sound event detection using non-negative dictionaries learned from annotated overlapping events , 2013, 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[20]  Tuomas Virtanen,et al.  Acoustic event detection in real life recordings , 2010, 2010 18th European Signal Processing Conference.

[21]  Ankit Shah,et al.  Sound Event Detection in Domestic Environments with Weakly Labeled Data and Soundscape Synthesis , 2019, DCASE.

[22]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[23]  Mathieu Lagrange,et al.  Detection and Classification of Acoustic Scenes and Events: Outcome of the DCASE 2016 Challenge , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[24]  Harri Valpola,et al.  Weight-averaged consistency targets improve semi-supervised deep learning results , 2017, ArXiv.

[25]  Gaël Richard,et al.  Overlapping sound event detection with supervised Nonnegative Matrix Factorization , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).