Localization and classification of overlapping sound events based on spectrogram-keypoint using acoustic-sensor-network data

Automatic localization and classification of environmental sound events can provide great aid to many human-centric IoT applications. However as many papers have mentioned, environmental sound events in daily life are complicated and hard to classify especially when multiple sounds happen simultaneously. Unlike most works, which decompose and classify overlapping signals using a unified model, we first decompose overlapping sound signals with a spectrogram-keypoint based localization algorithm. These located and clustered spectrogram-keypoints are subsequently reused for sound source classification. Our major contribution is the modeling of a global cost function to synchronize the time-difference-of-arrivals (TDOA) of each small spectrogram-keypoint and further locating the sound sources by these clustered keypoints. With these clustered keypoints, 2 different classification models are used to classify the sound sources. Our experiments show that our solution is both accurate and low-cost in terms of calculation effort.

[1]  Fariba Sadri,et al.  Ambient intelligence: A survey , 2011, CSUR.

[2]  Guillaume Lemaitre,et al.  Real-Time Detection of Overlapping Sound Events with Non-Negative Matrix Factorization , 2013 .

[3]  Onur Dikmen,et al.  Sound event detection in real life recordings using coupled matrix factorization of spectral representations and class activity annotations , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  Andrey Temko,et al.  Acoustic event detection in meeting-room environments , 2009, Pattern Recognit. Lett..

[5]  Henry Leung,et al.  Experimental Evaluation of Indoor Localization Using Wireless Sensor Networks , 2015, IEEE Sensors Journal.

[6]  Tarek F. Abdelzaher,et al.  Range-free localization schemes for large scale sensor networks , 2003, MobiCom '03.

[7]  Kazunori Komatani,et al.  Discriminative multiple sound source localization based on deep neural networks using independent location model , 2016, 2016 IEEE Spoken Language Technology Workshop (SLT).

[8]  Jont B. Allen,et al.  How do humans process and recognize speech? , 1993, IEEE Trans. Speech Audio Process..

[9]  Andreas Stolcke,et al.  The Microsoft 2017 Conversational Speech Recognition System , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  Chng Eng Siong,et al.  Overlapping sound event recognition using local spectrogram features and the generalised hough transform , 2013, Pattern Recognit. Lett..

[11]  Christian Breiteneder,et al.  Features for Content-Based Audio Retrieval , 2010, Adv. Comput..

[12]  Yukang Guo,et al.  Localising speech, footsteps and other sounds using resource-constrained devices , 2011, Proceedings of the 10th ACM/IEEE International Conference on Information Processing in Sensor Networks.

[13]  Ivan Dokmanic,et al.  Pyroomacoustics: A Python Package for Audio Room Simulation and Array Processing Algorithms , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  Heikki Huttunen,et al.  Recurrent neural networks for polyphonic sound event detection in real life recordings , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  Karol J. Piczak Environmental sound classification with convolutional neural networks , 2015, 2015 IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP).

[16]  Jean Rouat,et al.  Robust localization and tracking of simultaneous moving sound sources using beamforming and particle filtering , 2007, Robotics Auton. Syst..

[17]  James Scott,et al.  Audio Location: Accurate Low-Cost Location Sensing , 2005, Pervasive.

[18]  L. Lin,et al.  A concordance correlation coefficient to evaluate reproducibility. , 1989, Biometrics.

[19]  Guodong Guo,et al.  Content-based audio classification and retrieval by support vector machines , 2003, IEEE Trans. Neural Networks.

[20]  Yan Song,et al.  Robust Sound Event Classification Using Deep Neural Networks , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[21]  Petr Motlícek,et al.  Deep Neural Networks for Multiple Speaker Detection and Localization , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[22]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[23]  Annamaria Mesaros,et al.  Sound Event Detection in Multisource Environments Using Source Separation , 2011 .

[24]  Chng Eng Siong,et al.  Image Feature Representation of the Subband Power Distribution for Robust Sound Event Classification , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[25]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[26]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[27]  Renate Sitte,et al.  Comparison of techniques for environmental sound recognition , 2003, Pattern Recognit. Lett..