EAR-TUKE: The Acoustic Event Detection System

This paper introduces acoustic events detection system capable of processing continuous input audio stream in order to detect potentially dangerous acoustic events. The system is representing a light, easy extendable, log-term running and complete solution to acoustic event detection. The system is based on its own approach to detection and classification of acoustic events using modified Viterbi decoding process using in combination with Weighted Finite-State Transducers (WFSTs) to support extensibility and acoustic modeling based on Hidden Markov Models (HMMs). Thesystem is completely programmed in C++ language and was designed to be self sufficient and to not require any additional dependencies. Additionally also a signal preprocessing part for feature extraction of Mel-Frequency Cepstral Coefficient(MFCC), Frequency Bank Coefficient (FBANK) and Mel-Spectral Coefficient (MELSPEC) is included. For robustness increase the system contains Cepstral Mean Normalization (CMN) and our proposed removal of basic coefficients from feature vector.

[1]  Mehryar Mohri,et al.  Speech Recognition with Weighted Finite-State Transducers , 2008 .

[2]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[3]  Martin Lojka,et al.  Comparison of Different Feature Types for Acoustic Event Detection System , 2013, MCSS.

[4]  J. Juhar,et al.  Evaluating the modified viterbi decoder for long-term audio events monitoring task , 2012, Proceedings ELMAR-2012.

[5]  Andrzej Czyzewski,et al.  Application of Vector Sensors to Acoustic Surveillance of a Public Interior Space , 2011 .

[6]  Björn W. Schuller,et al.  Recent developments in openSMILE, the munich open-source multimedia feature extractor , 2013, ACM Multimedia.

[7]  Jozef Juhár,et al.  Acoustic Events Detection Using MFCC and MPEG-7 Descriptors , 2011, MCSS.

[8]  Andrzej Czyzewski,et al.  Acceleration of decision making in sound event recognition employing supercomputing cluster , 2014, Inf. Sci..

[9]  Andrzej Czyzewski,et al.  Multimedia Communications, Services and Security , 2014, Communications in Computer and Information Science.

[10]  F. Sattar,et al.  Automatic event detection for noisy hydrophone data using relevance features , 2013, 2013 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (PACRIM).

[11]  Tatsuya Kawahara,et al.  Recent Development of Open-Source Speech Recognition Engine Julius , 2009 .

[12]  Jordi Sole I Casals,et al.  Advances in Nonlinear Speech Processing, International Conference on Nonlinear Speech Processing, NOLISP 2009, Vic, Spain, June 25-27. Revised Selected Papers , 2010, NOLISP.

[13]  Douglas D. O'Shaughnessy,et al.  Comparative Evaluation of Feature Normalization Techniques for Speaker Verification , 2011, NOLISP.

[14]  Jozef Juhar,et al.  Modification of widely used feature vectors for real-time acoustic events detection , 2013, Proceedings ELMAR-2013.

[15]  Steve Young,et al.  The HTK book version 3.4 , 2006 .

[16]  Hideki Kashioka,et al.  A comparison of dynamic WFST decoding approaches , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).