Sound Events Recognition and Retrieval Using Multi-Convolutional-Channel Sparse Coding Convolutional Neural Networks

This article proposes two novel deep convolutional neural networks (CNN), which are called the sparse coding convolutional neural network (SC-CNN) and the multi-convolutional-channel SC-CNN (MSC-CNN), to address the sound event recognition and retrieval problem. Unlike the general framework of a CNN, in which the feature learning process is performed hierarchically, the proposed framework models the whole memorization process in the human brain, including encoding, storage, and recollection. In particular, the MSC-CNN is designed to recognize multiple sound events that occur simultaneously. The experimental results indicate that the proposed SC-CNN and MSC-CNN outperforms the state-of-the-art systems in sound event recognition and retrieval.

[1]  Jen-Hao Hsiao,et al.  Deep learning of binary hash codes for fast image retrieval , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[2]  Chung-Hsien Wu,et al.  Sound Event Recognition Using Auditory-Receptive-Field Binary Pattern and Hierarchical-Diving Deep Belief Network , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[3]  C.-C. Jay Kuo,et al.  Where am I? Scene Recognition for Mobile Robots using Audio Features , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[4]  Franz Pernkopf,et al.  Gated Recurrent Networks applied to Acoustic Scene Classification , 2016, DCASE.

[5]  Janto Skowronek,et al.  Automatic surveillance of the acoustic activity in our living environment , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[6]  Andrey Temko,et al.  ACOUSTIC EVENT DETECTION AND CLASSIFICATION IN SMART-ROOM ENVIRONMENTS: EVALUATION OF CHIL PROJECT SYSTEMS , 2006 .

[7]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[8]  Jungmin Lee,et al.  Attention-based Ensemble for Deep Metric Learning , 2018, ECCV.

[9]  Jonathan William Dennis,et al.  Sound event recognition in unstructured environments using spectrogram image processing , 2014 .

[10]  Yan Song,et al.  Robust Sound Event Classification Using Deep Neural Networks , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[11]  Takumi Kobayashi,et al.  Acoustic feature extraction by statistics based local binary pattern for environmental sound classification , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]  Takumi Kobayashi,et al.  Kernel discriminant analysis for environmental sound recognition based on acoustic subspace , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[13]  Daniel P. W. Ellis,et al.  Locating singing voice segments within music signals , 2001, Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat. No.01TH8575).

[14]  Yan Song,et al.  Robust sound event recognition using convolutional neural networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  Chng Eng Siong,et al.  Image Feature Representation of the Subband Power Distribution for Robust Sound Event Classification , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[16]  Satoshi Nakamura,et al.  Acoustical Sound Database in Real Environments for Sound Scene Understanding and Hands-Free Speech Recognition , 2000, LREC.

[17]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[18]  Chloé Clavel,et al.  Events Detection for an Audio-Based Surveillance System , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[19]  Heikki Huttunen,et al.  Recurrent neural networks for polyphonic sound event detection in real life recordings , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  Nikos Fakotakis,et al.  On acoustic surveillance of hazardous situations , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[21]  Francesc Alías,et al.  Two-step detection of water sound events for the diagnostic and monitoring of dementia , 2013, 2013 IEEE International Conference on Multimedia and Expo (ICME).

[22]  Takumi Kobayashi,et al.  Audio Data Mining for Anthropogenic Disaster Identification: An Automatic Taxonomy Approach , 2020, IEEE Transactions on Emerging Topics in Computing.

[23]  Doris Y. Tsao,et al.  The Code for Facial Identity in the Primate Brain , 2017, Cell.

[24]  Xiao Qin,et al.  Learnt dictionary based active learning method for environmental sound event tagging , 2019, Multimedia Tools and Applications.

[25]  David Stutz,et al.  Neural Codes for Image Retrieval , 2015 .

[26]  Sridhar Krishnan,et al.  Time–Frequency Matrix Feature Extraction and Classification of Environmental Audio Signals , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[27]  Haizhou Li,et al.  Spectrogram Image Feature for Sound Event Classification in Mismatched Conditions , 2011, IEEE Signal Processing Letters.

[28]  Tuomas Virtanen,et al.  Sound Event Detection in Multichannel Audio Using Spatial and Harmonic Features , 2017, DCASE.

[29]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[30]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[31]  Chang-Hong Lin,et al.  Gabor-Based Nonuniform Scale-Frequency Map for Environmental Sound Classification in Home Automation , 2014, IEEE Transactions on Automation Science and Engineering.

[32]  Keansub Lee,et al.  Minimal-impact audio-based personal archives , 2004, CARPE'04.