Detection of Acoustic Events by using MFCC and Spectro-Temporal Gabor Filterbank Features

Acoustic event Detection (AED) is concerned with recognition of sound which is produced by the human and the object that is handled by human or by nature. The Detection of acoustic events is an important task for our intelligent system which is supposed to recognize not only speech but also sounds of our indoor and outdoor environments that includes information retrieval, audio-based surveillance and monitoring systems. Currently, System for detection and classification of events from our daily monophonic sound is mature enough to extract features and detect isolated events nearly accurate but accuracy is very low for large dataset and for noisy and overlapped audio events. Mostly the real life sounds are polyphonic and events have some part of overlap which is harder to detect. In our work we will discuss the previous issues for detection and feature extraction of acoustic events. We use the DCASE dataset, published in an international IEEE AASP challenge for Acoustic Event Detection which includes the "office live" recordings which were prepared in an office environment. MFCC is a technique commonly used for features extraction of speech and Acoustic event. We propose to use the Gabor filterbank in addition to MFCCs coefficients to analyze the feature. For Classification we use the Decision tree algorithm that gives better classification and detection result.. Finally, we compare our proposed system with each system that was used for DCASE dataset and concluded that our technique gives best F-Score value in detection of events as compare to others.

[1]  Margaret Lech,et al.  Facial Expression Recognition Using Neural Networks and Log-Gabor Filters , 2008, 2008 Digital Image Computing: Techniques and Applications.

[2]  Dennis Gabor,et al.  Theory of communication , 1946 .

[3]  Bart Vanrumste,et al.  An exemplar-based NMF approach to audio event detection , 2013, 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[4]  Tuomas Virtanen,et al.  Context-dependent sound event detection , 2013, EURASIP Journal on Audio, Speech, and Music Processing.

[5]  Til Aach,et al.  On texture analysis: Local energy transforms versus quadrature filters , 1995, Signal Process..

[6]  Maximo Cobos,et al.  Cumulative-Sum-Based Localization of Sound Events in Low-Cost Wireless Acoustic Sensor Networks , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[7]  Thomas Sikora,et al.  How Efficient is MPEG-7 for General Sound Recognition? , 2004 .

[8]  Taras Butko,et al.  Feature selection for multimodal: acoustic event detection , 2011 .

[9]  François Pachet,et al.  Exploring Billions of Audio Features , 2007, 2007 International Workshop on Content-Based Multimedia Indexing.

[10]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[11]  P. Karsmakers,et al.  AN MFCC-GMM APPROACH FOR EVENT DETECTION AND CLASSIFICATION , 2013 .

[12]  Tushar Sandhan,et al.  Audio Bank: A high-level acoustic signal representation for audio event recognition , 2014, 2014 14th International Conference on Control, Automation and Systems (ICCAS 2014).

[13]  Joseph Picone,et al.  Signal modeling techniques in speech recognition , 1993, Proc. IEEE.

[14]  VirtanenTuomas,et al.  Detection and Classification of Acoustic Scenes and Events , 2018 .

[15]  D. Gabor Theory of communication. Part 3: Frequency compression and expansion , 1946 .

[16]  Juan Manuel Górriz,et al.  Voice Activity Detection. Fundamentals and Speech Recognition System Robustness , 2007 .

[17]  Maria E. Niessen,et al.  Hierarchical modeling using automated sub-clustering for sound event recognition , 2013, 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[18]  Mustafa Sert,et al.  Audio-based event detection in office live environments using optimized MFCC-SVM approach , 2015, Proceedings of the 2015 IEEE 9th International Conference on Semantic Computing (IEEE ICSC 2015).

[19]  Julien Pinquier,et al.  Water sound recognition based on physical models , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[20]  Derek Hoiem,et al.  SOLAR: sound object localization and retrieval in complex audio environments , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[21]  Max F. K. Happel,et al.  Discriminative Learning of Receptive Fields from Responses to Non-Gaussian Stimulus Ensembles , 2014, PloS one.

[22]  Jörn Anemüller,et al.  Spectro-Temporal Gabor Filterbank Features for Acoustic Event Detection , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[23]  Tony Ezzat,et al.  Localized spectro-temporal cepstral analysis of speech , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[24]  C. Schreiner,et al.  Gabor analysis of auditory midbrain receptive fields: spectro-temporal and binaural composition. , 2003, Journal of neurophysiology.

[25]  Peter Kabal,et al.  Frame level noise classification in mobile environments , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[26]  Renate Sitte,et al.  Comparison of techniques for environmental sound recognition , 2003, Pattern Recognit. Lett..

[27]  Shrikanth Narayanan,et al.  Environmental Sound Recognition With Time–Frequency Audio Features , 2009, IEEE Transactions on Audio, Speech, and Language Processing.