Recognition and retrieval of sound events using sparse coding convolutional neural network

This paper proposes a novel deep convolutional neural network (CNN), called sparse coding convolutional neural network (SC-CNN), to address the problem of sound event recognition and retrieval task. Unlike the general framework of a CNN, in which feature learning process is performed hierarchically, the proposed framework models the whole memorizing procedures in the human brain, including encoding, storage, and recollection. Sound data from the RWCP sound scene dataset with added noise from NOISEX-92 noise dataset are used to compare the performance of the proposed system with the state-of-the-art baselines. The experimental results indicated that the proposed SC-CNN outperformed the state-of-the-art systems in sound event recognition and retrieval. In the sound event recognition task, the proposed system achieved an accuracy of 94.6%, 100% and 100% under 0db, 10db and clean noise conditions, respectively. In the retrieval task, the proposed system improves the mAP rate of the general CNN by approximately 6%.

[1]  Victor S. Lempitsky,et al.  Neural Codes for Image Retrieval , 2014, ECCV.

[2]  Keansub Lee,et al.  Minimal-impact audio-based personal archives , 2004, CARPE'04.

[3]  Daniel P. W. Ellis,et al.  Locating singing voice segments within music signals , 2001, Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat. No.01TH8575).

[4]  Yan Song,et al.  Robust sound event recognition using convolutional neural networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  Chloé Clavel,et al.  Events Detection for an Audio-Based Surveillance System , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[6]  Francesc Alías,et al.  Two-step detection of water sound events for the diagnostic and monitoring of dementia , 2013, 2013 IEEE International Conference on Multimedia and Expo (ICME).

[7]  Jonathan William Dennis,et al.  Sound event recognition in unstructured environments using spectrogram image processing , 2014 .

[8]  Takumi Kobayashi,et al.  Acoustic feature extraction by statistics based local binary pattern for environmental sound classification , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Janto Skowronek,et al.  Automatic surveillance of the acoustic activity in our living environment , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[10]  Satoshi Nakamura,et al.  Acoustical Sound Database in Real Environments for Sound Scene Understanding and Hands-Free Speech Recognition , 2000, LREC.

[11]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[12]  Yan Song,et al.  Robust Sound Event Classification Using Deep Neural Networks , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[13]  Chng Eng Siong,et al.  Image Feature Representation of the Subband Power Distribution for Robust Sound Event Classification , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  C.-C. Jay Kuo,et al.  Where am I? Scene Recognition for Mobile Robots using Audio Features , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[15]  Chang-Hong Lin,et al.  Gabor-Based Nonuniform Scale-Frequency Map for Environmental Sound Classification in Home Automation , 2014, IEEE Transactions on Automation Science and Engineering.

[16]  Sridhar Krishnan,et al.  Time–Frequency Matrix Feature Extraction and Classification of Environmental Audio Signals , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[17]  Andrey Temko,et al.  Acoustic Event Detection and Classification , 2007, Computers in the Human Interaction Loop.

[18]  Haizhou Li,et al.  Spectrogram Image Feature for Sound Event Classification in Mismatched Conditions , 2011, IEEE Signal Processing Letters.

[19]  Heikki Huttunen,et al.  Recurrent neural networks for polyphonic sound event detection in real life recordings , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  Nikos Fakotakis,et al.  On acoustic surveillance of hazardous situations , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.