Sound Event Detection Using Graph Laplacian Regularization Based on Event Co-occurrence

The types of sound events that occur in a situation are limited, and some sound events are likely to co-occur; for instance, "dishes" and "glass jingling." In this paper, we propose a technique of sound event detection utilizing graph Laplacian regularization taking the sound event co-occurrence into account. In the proposed method, sound event occurrences are represented as a graph whose nodes indicate the frequency of event occurrence and whose edges indicate the co-occurrence of sound events. This graph representation is then utilized for sound event modeling, which is optimized under an objective function with a regularization term considering the graph structure. Experimental results obtained using TUT Sound Events 2016 development, 2017 development, and TUT Acoustic Scenes 2016 development indicate that the proposed method improves the detection performance of sound events by 7.9 percentage points compared to that of the conventional CNN-BiGRU-based method in terms of the segment-based F1-score. Moreover, the results show that the proposed method can detect co-occurring sound events more accurately than the conventional method.

[1]  Guillaume Lemaitre,et al.  Real-Time Detection of Overlapping Sound Events with Non-Negative Matrix Factorization , 2013 .

[2]  Nobutaka Ono,et al.  Acoustic Topic Model for Scene Analysis With Intermittently Missing Observations , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[3]  Ankit Shah,et al.  DCASE2017 Challenge Setup: Tasks, Datasets and Baseline System , 2017, DCASE.

[4]  Florian Metze,et al.  Event-based Video Retrieval Using Audio , 2012, INTERSPEECH.

[5]  Julien Pinquier,et al.  Water sound recognition based on physical models , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Heikki Huttunen,et al.  Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[7]  Reishi Kondo,et al.  Acoustic Event Detection Method Using Semi-Supervised Non-Negative Matrix Factorization with Mixtures of Local Dictionaries , 2016, DCASE.

[8]  Yong Xu,et al.  Surrey-cvssp system for DCASE2017 challenge task4 , 2017, ArXiv.

[9]  Seisuke Kyochi,et al.  Audio Source Separation Based on Nonnegative Matrix Factorization with Graph Harmonic Structure , 2018, 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC).

[10]  Annamaria Mesaros,et al.  Metrics for Polyphonic Sound Event Detection , 2016 .

[11]  R. Radhakrishnan,et al.  Audio analysis for surveillance applications , 2005, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005..

[12]  Tomoki Toda,et al.  Duration-Controlled LSTM for Polyphonic Sound Event Detection , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[13]  Tuomas Virtanen,et al.  TUT database for acoustic scene classification and sound event detection , 2016, 2016 24th European Signal Processing Conference (EUSIPCO).

[14]  Nikos Fakotakis,et al.  On acoustic surveillance of hazardous situations , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[15]  Aren Jansen,et al.  CNN architectures for large-scale audio classification , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[16]  Xiaojun Wu,et al.  Graph Regularized Nonnegative Matrix Factorization for Data Representation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Mounya Elhilali,et al.  Joint Acoustic and Class Inference for Weakly Supervised Sound Event Detection , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[18]  Keisuke Imoto,et al.  Introduction to acoustic event and scene analysis , 2018 .

[19]  Janto Skowronek,et al.  Automatic surveillance of the acoustic activity in our living environment , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[20]  Anssi Klapuri,et al.  Latent semantic analysis in sound event detection , 2011, 2011 19th European Signal Processing Conference.

[21]  Pascal Frossard,et al.  The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains , 2012, IEEE Signal Processing Magazine.

[22]  Ching-Yung Lin,et al.  Healthcare audio event classification using Hidden Markov Models and Hierarchical Hidden Markov Models , 2009, 2009 IEEE International Conference on Multimedia and Expo.