Leveraging the urban soundscape: Auditory perception for smart vehicles

Urban environments are characterised by the presence of distinctive audio signals which alert the drivers to events that require prompt action. The detection and interpretation of these signals would be highly beneficial for smart vehicle systems, as it would provide them with complementary information to navigate safely in the environment. In this paper, we present a framework that spots the presence of acoustic events, such as horns and sirens, using a two-stage approach. We first model the urban soundscape and use anomaly detection to identify the presence of an anomalous sound, and later determine the nature of this sound. As the audio samples are affected by copious non-stationary and unstructured noise, which can degrade classification performance, we propose a noise-removal technique to obtain a clean representation of the data we can use for classification and waveform reconstruction. The method is based on the idea of analysing the spectrograms of the incoming signals as images and applying spectrogram segmentation to isolate and extract the alerting signals from the background noise. We evaluate our framework on four hours of urban sounds collected driving around urban Oxford on different kinds of road and in different traffic conditions. When compared to traditional feature representations, such as Mel-frequency cepstrum coefficients, our framework shows an improvement of up to 31% in the classification rate.

[1]  Thomas S. Huang,et al.  Feature analysis and selection for acoustic event detection , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[3]  S. Wermter,et al.  Robotic Sound-Source Localization and Tracking Using Interaural Time Difference and Cross-Correlation , 2004 .

[4]  Xenofon Fafoutis,et al.  A Behavioral Study on the Effects of Rock Music on Auditory Attention , 2013, HBU.

[5]  S. Thomas Alexander,et al.  Adaptive Signal Processing , 1986, Texts and Monographs in Computer Science.

[6]  Guy J. Brown,et al.  Computational Auditory Scene Analysis: Principles, Algorithms, and Applications , 2006 .

[7]  Bruno Fazenda,et al.  Acoustic based safety emergency vehicle detection for intelligent transport systems , 2009, 2009 ICCAS-SICE.

[8]  Andrzej Czyzewski,et al.  Dangerous Sound Event Recognition Using Support Vector Machine Classifiers , 2010, MISSI.

[9]  Enrico Del Re,et al.  A real-time siren detector to improve safety of guide in traffic environment , 2008, 2008 16th European Signal Processing Conference.

[10]  Haizhou Li,et al.  Sound Event Recognition With Probabilistic Distance SVMs , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  L. Rabiner,et al.  An introduction to hidden Markov models , 1986, IEEE ASSP Magazine.

[12]  Clara Pizzuti,et al.  Fast Outlier Detection in High Dimensional Spaces , 2002, PKDD.

[13]  Philippe Souères,et al.  A survey on sound source localization in robotics: From binaural to array processing methods , 2015, Comput. Speech Lang..

[14]  Hrishikesh Deshpande,et al.  CLASSIFICATION OF MUSIC SIGNALS IN THE VISUAL DOMAIN , 2001 .

[15]  Tobias W. Stokes,et al.  Improving the perceptual quality of single-channel blind audio source separation , 2015 .

[16]  Sridhar Ramaswamy,et al.  Efficient algorithms for mining outliers from large data sets , 2000, SIGMOD '00.

[17]  Cheung-Fat Chan,et al.  An abnormal sound detection and classification system for surveillance applications , 2010, 2010 18th European Signal Processing Conference.

[18]  Aref Farhadi Pour,et al.  Gammatonegram based speaker identification , 2014, 2014 4th International Conference on Computer and Knowledge Engineering (ICCKE).

[19]  Trevor Darrell,et al.  Nearest-Neighbor Methods in Learning and Vision , 2008, IEEE Trans. Neural Networks.

[20]  Robert P. W. Duin,et al.  Support Vector Data Description , 2004, Machine Learning.

[21]  DeLiang Wang,et al.  On Ideal Binary Mask As the Computational Goal of Auditory Scene Analysis , 2005, Speech Separation by Humans and Machines.

[22]  N. A. Bradley,et al.  Assistive Technology For Visually Impaired And Blind People , 2008 .

[23]  Justin Salamon,et al.  A Dataset and Taxonomy for Urban Sound Research , 2014, ACM Multimedia.

[24]  Fiora Pirri,et al.  Multimodal Speaker Recognition in a Conversation Scenario , 2009, ICVS.

[25]  Pierre Baldi,et al.  Bayesian surprise attracts human attention , 2005, Vision Research.

[26]  King-Sun Fu,et al.  A survey on image segmentation , 1981, Pattern Recognit..

[27]  Richard F. Lyon,et al.  History and future of auditory filter models , 2010, Proceedings of 2010 IEEE International Symposium on Circuits and Systems.

[28]  Lars Kai Hansen,et al.  The Role of Top-Down Attention in the Cocktail Party: Revisiting Cherry's Experiment after Sixty Years , 2011, 2011 10th International Conference on Machine Learning and Applications and Workshops.

[29]  R A De Lorenzo,et al.  Lights and siren: a review of emergency vehicle warning systems. , 1991, Annals of emergency medicine.

[30]  Mounya Elhilali,et al.  Abnormal sound event detection using temporal trajectories mixtures , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[31]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[32]  Joachim Denzler,et al.  One-class classification with Gaussian processes , 2013, Pattern Recognit..

[33]  Radu Horaud,et al.  Sound-event recognition with a companion humanoid , 2012, 2012 12th IEEE-RAS International Conference on Humanoid Robots (Humanoids 2012).

[34]  Mitchel Weintraub,et al.  A theory and computational model of auditory monaural sound separation , 1985 .

[35]  Guy J. Brown,et al.  Computational auditory scene analysis , 1994, Comput. Speech Lang..

[36]  Tridibesh Dutta,et al.  TEXT DEPENDENT SPEAKER IDENTIFICATION BASED ON SPECTROGRAMS , 2007 .

[37]  Jörn Anemüller,et al.  Automatic acoustic siren detection in traffic noise by part-based models , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.