论文信息 - Sound Events Recognition and Retrieval Using Multi-Convolutional-Channel Sparse Coding Convolutional Neural Networks

Sound Events Recognition and Retrieval Using Multi-Convolutional-Channel Sparse Coding Convolutional Neural Networks

This article proposes two novel deep convolutional neural networks (CNN), which are called the sparse coding convolutional neural network (SC-CNN) and the multi-convolutional-channel SC-CNN (MSC-CNN), to address the sound event recognition and retrieval problem. Unlike the general framework of a CNN, in which the feature learning process is performed hierarchically, the proposed framework models the whole memorization process in the human brain, including encoding, storage, and recollection. In particular, the MSC-CNN is designed to recognize multiple sound events that occur simultaneously. The experimental results indicate that the proposed SC-CNN and MSC-CNN outperforms the state-of-the-art systems in sound event recognition and retrieval.

[1] Jen-Hao Hsiao,et al. Deep learning of binary hash codes for fast image retrieval , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[2] Chung-Hsien Wu,et al. Sound Event Recognition Using Auditory-Receptive-Field Binary Pattern and Hierarchical-Diving Deep Belief Network , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[3] C.-C. Jay Kuo,et al. Where am I? Scene Recognition for Mobile Robots using Audio Features , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[4] Franz Pernkopf,et al. Gated Recurrent Networks applied to Acoustic Scene Classification , 2016, DCASE.

[5] Janto Skowronek,et al. Automatic surveillance of the acoustic activity in our living environment , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[6] Andrey Temko,et al. ACOUSTIC EVENT DETECTION AND CLASSIFICATION IN SMART-ROOM ENVIRONMENTS: EVALUATION OF CHIL PROJECT SYSTEMS , 2006 .

[7] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.

[8] Jungmin Lee,et al. Attention-based Ensemble for Deep Metric Learning , 2018, ECCV.

[9] Jonathan William Dennis,et al. Sound event recognition in unstructured environments using spectrogram image processing , 2014 .

[10] Yan Song,et al. Robust Sound Event Classification Using Deep Neural Networks , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[11] Takumi Kobayashi,et al. Acoustic feature extraction by statistics based local binary pattern for environmental sound classification , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12] Takumi Kobayashi,et al. Kernel discriminant analysis for environmental sound recognition based on acoustic subspace , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[13] Daniel P. W. Ellis,et al. Locating singing voice segments within music signals , 2001, Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat. No.01TH8575).

[14] Yan Song,et al. Robust sound event recognition using convolutional neural networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15] Chng Eng Siong,et al. Image Feature Representation of the Subband Power Distribution for Robust Sound Event Classification , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[16] Satoshi Nakamura,et al. Acoustical Sound Database in Real Environments for Sound Scene Understanding and Hands-Free Speech Recognition , 2000, LREC.

[17] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[18] Chloé Clavel,et al. Events Detection for an Audio-Based Surveillance System , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[19] Heikki Huttunen,et al. Recurrent neural networks for polyphonic sound event detection in real life recordings , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20] Nikos Fakotakis,et al. On acoustic surveillance of hazardous situations , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[21] Francesc Alías,et al. Two-step detection of water sound events for the diagnostic and monitoring of dementia , 2013, 2013 IEEE International Conference on Multimedia and Expo (ICME).

[22] Takumi Kobayashi,et al. Audio Data Mining for Anthropogenic Disaster Identification: An Automatic Taxonomy Approach , 2020, IEEE Transactions on Emerging Topics in Computing.

[23] Doris Y. Tsao,et al. The Code for Facial Identity in the Primate Brain , 2017, Cell.

[24] Xiao Qin,et al. Learnt dictionary based active learning method for environmental sound event tagging , 2019, Multimedia Tools and Applications.

[25] David Stutz,et al. Neural Codes for Image Retrieval , 2015 .

[26] Sridhar Krishnan,et al. Time–Frequency Matrix Feature Extraction and Classification of Environmental Audio Signals , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[27] Haizhou Li,et al. Spectrogram Image Feature for Sound Event Classification in Mismatched Conditions , 2011, IEEE Signal Processing Letters.

[28] Tuomas Virtanen,et al. Sound Event Detection in Multichannel Audio Using Spatial and Harmonic Features , 2017, DCASE.

[29] Chih-Jen Lin,et al. LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[30] Pascal Vincent,et al. Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[31] Chang-Hong Lin,et al. Gabor-Based Nonuniform Scale-Frequency Map for Environmental Sound Classification in Home Automation , 2014, IEEE Transactions on Automation Science and Engineering.

[32] Keansub Lee,et al. Minimal-impact audio-based personal archives , 2004, CARPE'04.