Efficient acoustic detector of gunshots and glass breaking

An efficient acoustic events detection system EAR-TUKE is presented in this paper. The system is capable of processing continuous input audio stream in order to detect potentially dangerous acoustic events, specifically gunshots or breaking glass. The system is programmed entirely in C++ language (core math. functions in C) and was designed to be self sufficient without requiring additional dependencies. In the design and development process the main focus was put on easy support of new acoustic events detection, low memory profile, low computational requirements to operate on devices with low resources, and on long-term operation and continuous input stream monitoring without any maintenance. In order to satisfy these requirements on the system, EAR-TUKE is based on a custom approach to detection and classification of acoustic events. The system is using acoustic models of events based on Hidden Markov Models (HMMs) and a modified Viterbi decoding process with an additional module to allow continuous monitoring. These features in combination with Weighted Finite-State Transducers (WFSTs) for the search network representation fulfill the easy extensibility requirement. Extraction algorithms for Mel-Frequency Cepstral Coefficients (MFCC), Frequency Bank Coefficients (FBANK) and Mel-Spectral Coefficients (MELSPEC) are also included in the preprocessing part. The system contains Cepstral Mean Normalization (CMN) and our proposed removal of basic coefficients from feature vectors to increase robustness. This paper also presents the development process and results evaluating the final design of the system.

[1]  Augusto Sarti,et al.  Scream and gunshot detection in noisy environments , 2007, 2007 15th European Signal Processing Conference.

[2]  José,et al.  Gunshot detection in noisy environments , 2010 .

[3]  Jozef Juhár,et al.  Acoustic Events Detection Using MFCC and MPEG-7 Descriptors , 2011, MCSS.

[4]  Jeroen Breebaart,et al.  Features for audio and music classification , 2003, ISMIR.

[5]  Martin Lojka,et al.  Slovak Automatic Dictation System for Judicial Domain , 2011, LTC.

[6]  Mehryar Mohri,et al.  Speech Recognition with Weighted Finite-State Transducers , 2008 .

[7]  Tatsuya Kawahara,et al.  Recent Development of Open-Source Speech Recognition Engine Julius , 2009 .

[8]  Andrzej Czyzewski,et al.  Acceleration of decision making in sound event recognition employing supercomputing cluster , 2014, Inf. Sci..

[9]  Vrijendra Singh,et al.  Algorithm for Gunshot Detection Using Mel-Frequency Cepstrum Coefficients (MFCC) , 2014 .

[10]  Stanislav Ondas,et al.  Online natural language processing of the Slovak Language , 2014, 2014 5th IEEE Conference on Cognitive Infocommunications (CogInfoCom).

[11]  Benjamin Georgi,et al.  The General Hidden Markov Model Library : Analyzing Systems with Unobservable States , 2004 .

[12]  Martin Lojka,et al.  Comparison of Different Feature Types for Acoustic Event Detection System , 2013, MCSS.

[13]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[14]  Jonathan Foote,et al.  Content-based retrieval of music and audio , 1997, Other Conferences.

[15]  Jozef Juhar,et al.  Modification of widely used feature vectors for real-time acoustic events detection , 2013, Proceedings ELMAR-2013.

[16]  Steve Young,et al.  The HTK book version 3.4 , 2006 .

[17]  Jozef Juhar,et al.  Comparison of feature selection algorithms for acoustic event detection system , 2014, Proceedings ELMAR-2014.

[18]  J. Juhar,et al.  Evaluating the modified viterbi decoder for long-term audio events monitoring task , 2012, Proceedings ELMAR-2012.

[19]  Douglas D. O'Shaughnessy,et al.  Comparative Evaluation of Feature Normalization Techniques for Speaker Verification , 2011, NOLISP.

[20]  Jozef Juhár,et al.  Feature selection for acoustic events detection , 2013, Multimedia Tools and Applications.

[21]  Björn W. Schuller,et al.  Recent developments in openSMILE, the munich open-source multimedia feature extractor , 2013, ACM Multimedia.

[22]  Augusto Sarti,et al.  Scream and gunshot detection and localization for audio-surveillance systems , 2007, 2007 IEEE Conference on Advanced Video and Signal Based Surveillance.

[23]  Samy Bengio,et al.  SVMTorch: Support Vector Machines for Large-Scale Regression Problems , 2001, J. Mach. Learn. Res..

[24]  Buket D. Barkana,et al.  NON-SPEECH ENVIRONMENTAL SOUND CLASSIFICATION USING SVMS WITH A NEW SET OF FEATURES , 2012 .

[25]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[26]  F. Sattar,et al.  Automatic event detection for noisy hydrophone data using relevance features , 2013, 2013 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (PACRIM).

[27]  Manfred K. Warmuth,et al.  THE CMU SPHINX-4 SPEECH RECOGNITION SYSTEM , 2001 .

[28]  Gavin Brown,et al.  Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection , 2012, J. Mach. Learn. Res..

[29]  Hideki Kashioka,et al.  A comparison of dynamic WFST decoding approaches , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[30]  Andrzej Czyzewski,et al.  Application of Vector Sensors to Acoustic Surveillance of a Public Interior Space , 2011 .