Audio Surveillance: a Systematic Review

Despite surveillance systems are becoming increasingly ubiquitous in our living environment, automated surveillance, currently based on video sensory modality and machine intelligence, lacks most of the time the robustness and reliability required in several real applications. To tackle this issue, audio sensory devices have been taken into account, both alone or in combination with video, giving birth, in the last decade, to a considerable amount of research. In this paper audio-based automated surveillance methods are organized into a comprehensive survey: a general taxonomy, inspired by the more widespread video surveillance field, is proposed in order to systematically describe the methods covering background subtraction, event classification, object tracking and situation analysis. For each of these tasks, all the significant works are reviewed, detailing their pros and cons and the context for which they have been proposed. Moreover, a specific section is devoted to audio features, discussing their expressiveness and their employment in the above described tasks. Differently, from other surveys on audio processing and analysis, the present one is specifically targeted to automated surveillance, highlighting the target applications of each described methods and providing the reader tables and schemes useful to retrieve the most suited algorithms for a specific requirement.

[1]  Feng Gao,et al.  Hierarchical Audio-Visual Surveillance for Passenger Elevators , 2014, MMM.

[2]  Neil J. Gordon,et al.  A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking , 2002, IEEE Trans. Signal Process..

[3]  Yoav Y. Schechner,et al.  Onsets Coincidence for Cross-Modal Analysis , 2010, IEEE Transactions on Multimedia.

[4]  Alessio Del Bue,et al.  A Bilinear Approach to the Position Self-Calibration of Multiple Sensors , 2012, IEEE Transactions on Signal Processing.

[5]  Juan José Burred,et al.  Audio event detection based on layered symbolic sequence representations , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  C.-C. Jay Kuo,et al.  A semi-supervised learning approach to online audio background detection , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  Marco Crocco,et al.  Design of Superdirective Planar Arrays With Sparse Aperiodic Layouts for Processing Broadband Signals via 3-D Beamforming , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[8]  Chloé Clavel,et al.  Events Detection for an Audio-Based Surveillance System , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[9]  Daniel P. W. Ellis,et al.  Detecting Alarm Sounds , 2001 .

[10]  Michael S. Brandstein,et al.  A practical methodology for speech source localization with microphone arrays , 1997, Comput. Speech Lang..

[11]  Richard Heusdens,et al.  Auto-localization in ad-hoc microphone arrays , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[12]  Fausto Pellandini,et al.  Automatic sound detection and recognition for noisy environment , 2000, 2000 10th European Signal Processing Conference.

[13]  L. Couvreur,et al.  Normalized Auditory Attention LevelsFor Automatic Audio Surveillance , 2008 .

[14]  Parham Aarabi,et al.  Robust sound localization using multi-source audiovisual information fusion , 2001, Inf. Fusion.

[15]  Alessia Saggese,et al.  An Ensemble of Rejecting Classifiers for Anomaly Detection of Audio Events , 2012, 2012 IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance.

[16]  Alfred O. Hero,et al.  Energy-based sensor network source localization via projection onto convex sets , 2006, IEEE Trans. Signal Process..

[17]  Javier Medina,et al.  Intelligent surveillance system with integration of heterogeneous information for intrusion detection , 2011, Expert Syst. Appl..

[18]  Adrian D. C. Chan,et al.  Security monitoring using microphone arrays and audio classification , 2006, IEEE Transactions on Instrumentation and Measurement.

[19]  Sergios Theodoridis,et al.  Audio-Visual Fusion for Detecting Violent Scenes in Videos , 2010, SETN.

[20]  Augusto Sarti,et al.  Scream and gunshot detection in noisy environments , 2007, 2007 15th European Signal Processing Conference.

[21]  Vesa T. Peltonen,et al.  Audio-based context recognition , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[22]  Chloé Clavel,et al.  Fear-type emotion recognition for future audio-based surveillance systems , 2008, Speech Commun..

[23]  Yongbeom Lee,et al.  Real-time audio-visual localization of user using microphone array and vision camera , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[24]  Shrikanth Narayanan,et al.  Environmental Sound Recognition With Time–Frequency Audio Features , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[25]  Jhing-Fa Wang,et al.  Smart Homecare Surveillance System: Behavior Identification Based on State-Transition Support Vector Machines and Sound Directivity Pattern Analysis , 2013, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[26]  Yongwha Chung,et al.  Automatic Detection and Recognition of Pig Wasting Diseases Using Sound Data in Audio Surveillance Systems , 2013, Sensors.

[27]  Nebojsa Jojic,et al.  A Graphical Model for Audiovisual Object Tracking , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[28]  Renate Sitte,et al.  Comparison of techniques for environmental sound recognition , 2003, Pattern Recognit. Lett..

[29]  Michael S. Brandstein,et al.  Robust Localization in Reverberant Rooms , 2001, Microphone Arrays.

[30]  Manuele Bicego,et al.  On-line adaptive background modelling for audio surveillance , 2004, ICPR 2004.

[31]  Hanseok Ko,et al.  Selective Background Adaptation Based Abnormal Acoustic Event Recognition for Audio Surveillance , 2012, 2012 IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance.

[32]  Kung Yao,et al.  Maximum-likelihood source localization and unknown sensor location estimation for wideband signals in the near-field , 2002, IEEE Trans. Signal Process..

[33]  Albert S. Bregman,et al.  The Auditory Scene. (Book Reviews: Auditory Scene Analysis. The Perceptual Organization of Sound.) , 1990 .

[34]  Mohan S. Kankanhalli,et al.  Audio Based Event Detection for Multimedia Surveillance , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[35]  Ning Liu,et al.  Bathroom Activity Monitoring Based on Sound , 2005, Pervasive.

[36]  Mohan S. Kankanhalli,et al.  Information assimilation framework for event detection in multimedia surveillance systems , 2006, Multimedia Systems.

[37]  James R. Hopgood,et al.  Video tracking through occlusions by fast audio source localisation , 2013, 2013 IEEE International Conference on Image Processing.

[38]  Mohan S. Kankanhalli,et al.  Multimodal fusion for multimedia analysis: a survey , 2010, Multimedia Systems.

[39]  Philippe Smets,et al.  Data association in multi‐target detection using the transferable belief model , 2001, Int. J. Intell. Syst..

[40]  Xin Li,et al.  Layered Representation for Pedestrian Detection and Tracking in Infrared Imagery , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Workshops.