Global statistical features-based approach for Acoustic Event Detection

Abstract The analysis of acoustic data typically discusses the problem of segmenting the acoustic events into non-overlapping acoustically compact categories. In Acoustic Event Detection (AED), an acoustic event is categorized into speech and non-speech events. Detection of non-speech sounds such as scream, gun shots, explosions, and glass break events is very helpful in acoustic surveillance, multimedia information retrieval, and acoustic forensic applications. In this paper, we propose global statistical features-based representation for multi-variate varying length acoustic data. A discriminative model-based classifier is then used to classify different acoustic events. The proposed representation is of very less dimension. The proposed approach is evaluated on surveillance-oriented AED datasets such as CICESE (recorded from a smart room scenario), Environmental Sound Classification (ESC), and IEEE AASP/DCASE2013 (Office environment) datasets. The proposed approach gives a better performance when compared with the conventional Hidden Markov Model (HMM) and Gaussian Mixture Model (GMM) approaches.

[1]  Dan Stowell,et al.  Detection and Classification of Acoustic Scenes and Events , 2015, IEEE Transactions on Multimedia.

[2]  Vesa T. Peltonen,et al.  Audio-based context recognition , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Bart Vanrumste,et al.  An exemplar-based NMF approach to audio event detection , 2013, 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[4]  Andrey Temko,et al.  Acoustic Event Detection and Classification , 2007, Computers in the Human Interaction Loop.

[5]  Hafiz Malik,et al.  Acoustic Environment Identification and Its Applications to Audio Forensics , 2013, IEEE Transactions on Information Forensics and Security.

[6]  Yan Song,et al.  Robust Sound Event Classification Using Deep Neural Networks , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[7]  Mohan S. Kankanhalli,et al.  Audio Based Event Detection for Multimedia Surveillance , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[8]  Michael Cheffena,et al.  Fall Detection Using Smartphone Audio Features , 2016, IEEE Journal of Biomedical and Health Informatics.

[9]  Nikos Fakotakis,et al.  Probabilistic Novelty Detection for Acoustic Surveillance Under Real-World Conditions , 2011, IEEE Transactions on Multimedia.

[10]  C.-C. Jay Kuo,et al.  Environmental sound recognition: A survey , 2013, 2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference.

[11]  Fakhri Karray,et al.  Survey on speech emotion recognition: Features, classification schemes, and databases , 2011, Pattern Recognit..

[12]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[13]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[14]  Jesús Favela,et al.  Scalable identification of mixed environmental sounds, recorded from heterogeneous sources , 2015, Pattern Recognit. Lett..

[15]  Asma Rabaoui,et al.  Using One-Class SVMs and Wavelets for Audio Surveillance , 2008, IEEE Transactions on Information Forensics and Security.

[16]  Vittorio Murino,et al.  Audio Surveillance , 2014, ACM Comput. Surv..

[17]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Yu Tsao,et al.  Sparse representation based on a bag of spectral exemplars for acoustic event detection , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).