Reliable detection of audio events in highly noisy environments

The authors propose an audio events detection system tailored to surveillance applications.The method has been tested on a huge and challenging data set, made publicly available.The performance analysis has been done for low SNR values and under various conditions.A comparative analysis with other methods from the literature has been performed. In this paper we propose a novel method for the detection of audio events for surveillance applications. The method is based on the bag of words approach, adapted to deal with the specific issues of audio surveillance: the need to recognize both short and long sounds, the presence of a significant noise level and of superimposed background sounds of intensity comparable to the audio events to be detected. In order to test the proposed method in complex, realistic scenarios, we have built a large, publicly available dataset of audio events. The dataset has allowed us to evaluate the robustness of our method with respect to varying levels of the Signal-to-Noise Ratio; the experimentation has confirmed its applicability under real world conditions, and has shown a significant performance improvement with respect to other methods from the literature.

[1]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[2]  D.R. Reddy,et al.  Speech recognition by machine: A review , 1976, Proceedings of the IEEE.

[3]  Jérôme Louradour,et al.  Audio Events Detection in Public Transport Vehicle , 2006, 2006 IEEE Intelligent Transportation Systems Conference.

[4]  Tsuhan Chen,et al.  Audio Feature Extraction and Analysis for Scene Segmentation and Classification , 1998, J. VLSI Signal Process..

[5]  Mario Vento,et al.  A real-time text-independent speaker identification system , 2003, 12th International Conference on Image Analysis and Processing, 2003.Proceedings..

[6]  Augusto Sarti,et al.  Scream and gunshot detection and localization for audio-surveillance systems , 2007, 2007 IEEE Conference on Advanced Video and Signal Based Surveillance.

[7]  Andrew Zisserman,et al.  Efficient Visual Search of Videos Cast as Text Retrieval , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Alessia Saggese,et al.  An Ensemble of Rejecting Classifiers for Anomaly Detection of Audio Events , 2012, 2012 IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance.

[9]  Lei Wang,et al.  In defense of soft-assignment coding , 2011, 2011 International Conference on Computer Vision.

[10]  Asma Rabaoui,et al.  Using One-Class SVMs and Wavelets for Audio Surveillance , 2008, IEEE Transactions on Information Forensics and Security.

[11]  Dan Istrate,et al.  Sound Detection and Classification for Medical Telesurvey , 2004 .

[12]  Alessia Saggese,et al.  A real time algorithm for people tracking using contextual reasoning , 2013, Comput. Vis. Image Underst..

[13]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[14]  Vincenzo Moscato,et al.  One-Class SVM Based Approach for Detecting Anomalous Audio Events , 2014, 2014 International Conference on Intelligent Networking and Collaborative Systems.

[15]  Zhouyu Fu,et al.  Music classification via the bag-of-features approach , 2011, Pattern Recognit. Lett..

[16]  Nikos Fakotakis,et al.  An Adaptive Framework for Acoustic Monitoring of Potential Hazards , 2009, EURASIP J. Audio Speech Music. Process..

[17]  Chloé Clavel,et al.  Events Detection for an Audio-Based Surveillance System , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[18]  Murat Akbacak,et al.  Bag-of-Audio-Words Approach for Multimedia Event Classification , 2012, INTERSPEECH.

[19]  Michael Wagner,et al.  Investigating feature-level fusion for checking liveness in face-voice authentication , 2005, Proceedings of the Eighth International Symposium on Signal Processing and Its Applications, 2005..

[20]  Lie Lu,et al.  Co-clustering for Auditory Scene Categorization , 2008, IEEE Transactions on Multimedia.

[21]  Vittorio Murino,et al.  Audio Surveillance: a Systematic Review , 2014 .

[22]  Alessia Saggese,et al.  Recognizing Human Actions by a Bag of Visual Words , 2013, 2013 IEEE International Conference on Systems, Man, and Cybernetics.

[23]  Alessia Saggese,et al.  Cascade classifiers trained on gammatonegrams for reliably detecting audio events , 2014, 2014 11th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[24]  Sébastien Marcel,et al.  A Fast Parts-Based Approach to Speaker Verification Using Boosted Slice Classifiers , 2012, IEEE Transactions on Information Forensics and Security.

[25]  Juan José Burred,et al.  Audio event detection based on layered symbolic sequence representations , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[26]  Alessia Saggese,et al.  Audio surveillance using a bag of aural words classifier , 2013, 2013 10th IEEE International Conference on Advanced Video and Signal Based Surveillance.

[27]  Tanja Schultz,et al.  Automatic speech recognition for under-resourced languages: A survey , 2014, Speech Commun..

[28]  Zia Saquib,et al.  A Survey on Automatic Speaker Recognition Systems , 2010, FGIT-SIP/MulGraB.

[29]  Manuele Bicego,et al.  Audio-Visual Event Recognition in Surveillance Video Sequences , 2007, IEEE Transactions on Multimedia.