Audio Surveillance of Roads: A System for Detecting Anomalous Sounds

In the last decades, several systems based on video analysis have been proposed for automatically detecting accidents on roads to ensure a quick intervention of emergency teams. However, in some situations, the visual information is not sufficient or sufficiently reliable, whereas the use of microphones and audio event detectors can significantly improve the overall reliability of surveillance systems. In this paper, we propose a novel method for detecting road accidents by analyzing audio streams to identify hazardous situations such as tire skidding and car crashes. Our method is based on a two-layer representation of an audio stream: at a low level, the system extracts a set of features that is able to capture the discriminant properties of the events of interest, and at a high level, a representation based on a bag-of-words approach is then exploited in order to detect both short and sustained events. The deployment architecture for using the system in real environments is discussed, together with an experimental analysis carried out on a data set made publicly available for benchmarking purposes. The obtained results confirm the effectiveness of the proposed approach.

[1]  Augusto Sarti,et al.  Scream and gunshot detection in noisy environments , 2007, 2007 15th European Signal Processing Conference.

[2]  Monique Thonnat,et al.  Audio-Video Event Recognition System for Public Transport Security , 2006 .

[3]  Qi Li,et al.  A Novel Element Detection Method in Audio Sensor Networks , 2013, Int. J. Distributed Sens. Networks.

[4]  Tarak Gandhi,et al.  Pedestrian Protection Systems: Issues, Survey, and Challenges , 2007, IEEE Transactions on Intelligent Transportation Systems.

[5]  Dan Istrate,et al.  Sound Detection and Classification for Medical Telesurvey , 2004 .

[6]  Tanja Schultz,et al.  Automatic speech recognition for under-resourced languages: A survey , 2014, Speech Commun..

[7]  Mohan M. Trivedi,et al.  Observing on-road vehicle behavior: Issues, approaches, and perspectives , 2013, 16th International IEEE Conference on Intelligent Transportation Systems (ITSC 2013).

[8]  Alessia Saggese,et al.  Dynamic Scene Understanding for Behavior Analysis Based on String Kernels , 2014, IEEE Transactions on Circuits and Systems for Video Technology.

[9]  Asma Rabaoui,et al.  Using One-Class SVMs and Wavelets for Audio Surveillance , 2008, IEEE Transactions on Information Forensics and Security.

[10]  Latesh Malik,et al.  REVIEW ON VEHICULAR SPEED, DENSITY ESTIMATION AND CLASSIFICATION USING ACOUSTIC SIGNAL , 2013 .

[11]  Mario Vento,et al.  A real-time text-independent speaker identification system , 2003, 12th International Conference on Image Analysis and Processing, 2003.Proceedings..

[12]  Alessia Saggese,et al.  Cascade classifiers trained on gammatonegrams for reliably detecting audio events , 2014, 2014 11th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[13]  张国亮,et al.  Comparison of Different Implementations of MFCC , 2001 .

[14]  Manuele Bicego,et al.  Audio-Visual Event Recognition in Surveillance Video Sequences , 2007, IEEE Transactions on Multimedia.

[15]  Petros A. Ioannou,et al.  Traffic Flow Prediction for Road Transportation Networks With Limited Traffic Data , 2015, IEEE Transactions on Intelligent Transportation Systems.

[16]  Zheng Fang,et al.  Comparison of different implementations of MFCC , 2001 .

[17]  Vittorio Murino,et al.  Audio Surveillance , 2014, ACM Comput. Surv..

[18]  Mohan M. Trivedi,et al.  Looking at Vehicles on the Road: A Survey of Vision-Based Vehicle Detection, Tracking, and Behavior Analysis , 2013, IEEE Transactions on Intelligent Transportation Systems.

[19]  Johannes D. Krijnders,et al.  CASSANDRA: audio-video sensor fusion for aggression detection , 2007, 2007 IEEE Conference on Advanced Video and Signal Based Surveillance.

[20]  Chloé Clavel,et al.  Events Detection for an Audio-Based Surveillance System , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[21]  Juan José Burred,et al.  Audio event detection based on layered symbolic sequence representations , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[22]  Kennerly H. Digges,et al.  Enhanced Automatic Collision Notification (ACN) System - Improved Rescue Care Due to Injury Prediction - First Field Experience , 2009 .

[23]  Augusto Sarti,et al.  Scream and gunshot detection and localization for audio-surveillance systems , 2007, 2007 IEEE Conference on Advanced Video and Signal Based Surveillance.

[24]  Patrick Marmaroli,et al.  Observation of Vehicle Axles Through Pass-by Noise: A Strategy of Microphone Array Design , 2013, IEEE Transactions on Intelligent Transportation Systems.

[25]  Douglas C. Schmidt,et al.  WreckWatch: Automatic Traffic Accident Detection and Notification with Smartphones , 2011, Mob. Networks Appl..

[26]  Hafiz Malik,et al.  Acoustic Environment Identification and Its Applications to Audio Forensics , 2013, IEEE Transactions on Information Forensics and Security.

[27]  Fei-Yue Wang,et al.  Traffic Flow Prediction With Big Data: A Deep Learning Approach , 2015, IEEE Transactions on Intelligent Transportation Systems.

[28]  Bhiksha Raj,et al.  Doppler based speed estimation of vehicles using passive sensor , 2013, 2013 IEEE International Conference on Multimedia and Expo Workshops (ICMEW).

[29]  E. Zwicker,et al.  Subdivision of the audible frequency range into critical bands , 1961 .

[30]  M. A. Anusuya,et al.  Speech Recognition by Machine, A Review , 2010, ArXiv.

[31]  D.R. Reddy,et al.  Speech recognition by machine: A review , 1976, Proceedings of the IEEE.

[32]  Claudio Guarnaccia,et al.  A Review of Traffic Noise Predictive Models , 2009 .

[33]  Nikos Fakotakis,et al.  Probabilistic Novelty Detection for Acoustic Surveillance Under Real-World Conditions , 2011, IEEE Transactions on Multimedia.

[34]  Nikos Fakotakis,et al.  An Adaptive Framework for Acoustic Monitoring of Potential Hazards , 2009, EURASIP J. Audio Speech Music. Process..

[35]  Jérôme Louradour,et al.  Audio Events Detection in Public Transport Vehicle , 2006, 2006 IEEE Intelligent Transportation Systems Conference.

[36]  Alessia Saggese,et al.  Audio surveillance using a bag of aural words classifier , 2013, 2013 10th IEEE International Conference on Advanced Video and Signal Based Surveillance.

[37]  Lu Wang,et al.  Multiple-Human Tracking by Iterative Data Association and Detection Update , 2014, IEEE Transactions on Intelligent Transportation Systems.

[38]  Tsuhan Chen,et al.  Audio Feature Extraction and Analysis for Scene Segmentation and Classification , 1998, J. VLSI Signal Process..

[39]  Zia Saquib,et al.  A Survey on Automatic Speaker Recognition Systems , 2010, FGIT-SIP/MulGraB.

[40]  Laurent Lucat,et al.  Audio-video surveillance system for public transportation , 2010, 2010 2nd International Conference on Image Processing Theory, Tools and Applications.

[41]  Yoshua Bengio,et al.  Inference for the Generalization Error , 1999, Machine Learning.

[42]  Alessia Saggese,et al.  An Ensemble of Rejecting Classifiers for Anomaly Detection of Audio Events , 2012, 2012 IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance.

[43]  Lie Lu,et al.  Co-clustering for Auditory Scene Categorization , 2008, IEEE Transactions on Multimedia.

[44]  Campbell Steele,et al.  A critical review of some traffic noise prediction models , 2001 .