Acoustic Scene Classification: Classifying environments from the sounds they produce

In this article, we present an account of the state of the art in acoustic scene classification (ASC), the task of classifying environments from the sounds they produce. Starting from a historical review of previous research in this area, we define a general framework for ASC and present different implementations of its components. We then describe a range of different algorithms submitted for a data challenge that was held to provide a general and fair benchmark for ASC techniques. The data set recorded for this purpose is presented along with the performance metrics that are used to evaluate the algorithms and statistical significance tests to compare the submitted methods.

[1]  Yangsheng Xu,et al.  Intelligent wearable interfaces , 2007 .

[2]  Vesa T. Peltonen,et al.  Computational auditory scene recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  J. Ballas Common factors in the identification of an assortment of brief everyday sounds. , 1993, Journal of experimental psychology. Human perception and performance.

[4]  François Pachet,et al.  Improving Timbre Similarity : How high’s the sky ? , 2004 .

[5]  Alexandros Nanopoulos,et al.  Hubs in Space: Popular Nearest Neighbors in High-Dimensional Data , 2010, J. Mach. Learn. Res..

[6]  D. Dubois,et al.  A cognitive approach to urban soundscapes : Using verbal data to access everyday life auditory categories , 2006 .

[7]  Alexander H. Waibel,et al.  Classifying user environment for mobile applications using linear autoencoding of ambient audio , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[8]  D. Wang,et al.  Computational Auditory Scene Analysis: Principles, Algorithms, and Applications , 2008, IEEE Trans. Neural Networks.

[9]  Markus Rupp,et al.  Reproducible research in signal processing , 2009, IEEE Signal Processing Magazine.

[10]  François Pachet,et al.  Automatic Recognition of Urban Sound Sources , 2006 .

[11]  Nobutaka Ito,et al.  The Diverse Environments Multi-channel Acoustic Noise Database (DEMAND): A database of multichannel environmental noise recordings , 2013 .

[12]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[13]  Dan Stowell,et al.  Detection and Classification of Acoustic Scenes and Events , 2015, IEEE Transactions on Multimedia.

[14]  Patrick Kenny,et al.  Support vector machines versus fast scoring in the low-dimensional total variability space for speaker verification , 2009, INTERSPEECH.

[15]  Robert E. Schapire,et al.  The Boosting Approach to Machine Learning An Overview , 2003 .

[16]  Keansub Lee,et al.  Minimal-impact audio-based personal archives , 2004, CARPE'04.

[17]  Dan Stowell,et al.  A database and challenge for acoustic scene classification and event detection , 2013, 21st European Signal Processing Conference (EUSIPCO 2013).

[18]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[19]  Volume Assp,et al.  ACOUSTICS. SPEECH. AND SIGNAL PROCESSING , 1983 .

[20]  Juha T. Tuomi,et al.  Audio-based context awareness - acoustic modeling and perceptual evaluation , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[21]  Vincent Fontaine,et al.  Automatic classification of environmental noise events by hidden Markov models , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[22]  T. Andringa,et al.  DARES-G 1 : Database of Annotated Real-world Everyday Sounds , 2009 .

[23]  Daniel Patrick Whittlesey Ellis,et al.  Prediction-driven computational auditory scene analysis , 1996 .

[24]  Tuomas Virtanen,et al.  Context-dependent sound event detection , 2013, EURASIP Journal on Audio, Speech, and Music Processing.

[25]  Ronald H. Rnndles Nonparametric Statistical Inference (2nd ed.) , 1986 .

[26]  Patrick Susini,et al.  Perceptual study of soundscapes in train stations , 2008 .

[27]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[28]  Shrikanth Narayanan,et al.  Environmental Sound Recognition With Time–Frequency Audio Features , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[29]  C.-C. Jay Kuo,et al.  Where am I? Scene Recognition for Mobile Robots using Audio Features , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[30]  J. W. Gordon,et al.  Perceptual effects of spectral modifications on musical timbres , 1978 .

[31]  Dan Stowell,et al.  Detection and classification of acoustic scenes and events: An IEEE AASP challenge , 2013, 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[32]  Julien Pinquier,et al.  Water sound recognition based on physical models , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[33]  Stephen McAdams,et al.  Recognition of sound sources and events , 1993 .

[34]  Bill N. Schilit,et al.  Context-aware computing applications , 1994, Workshop on Mobile Computing Systems and Applications.

[35]  Joshua D. Reiss,et al.  Enabling access to sound archives through integration, enrichment and retrieval , 2008, 2008 IEEE International Conference on Multimedia and Expo.

[36]  Bhiksha Raj,et al.  Unsupervised Learning of Acoustic Unit Descriptors for Audio Content Representation and Classification , 2011, INTERSPEECH.

[37]  DeLiang Wang,et al.  Auditory Segmentation Based on Onset and Offset Analysis , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[38]  C.-C. Jay Kuo,et al.  Audio content analysis for online audiovisual data segmentation and classification , 2001, IEEE Trans. Speech Audio Process..

[39]  François Pachet,et al.  The bag-of-frames approach to audio pattern recognition: a sufficient model for urban soundscapes but not for polyphonic music. , 2007, The Journal of the Acoustical Society of America.

[40]  Pattie Maes,et al.  Situational Awareness from Environmental Sounds , 1997 .

[41]  Lie Lu,et al.  Content analysis for audio classification and segmentation , 2002, IEEE Trans. Speech Audio Process..

[42]  R. M. Schafer,et al.  The tuning of the world , 1977 .

[43]  R. Radhakrishnan,et al.  Audio analysis for surveillance applications , 2005, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005..

[44]  Peter Kabal,et al.  Frame level noise classification in mobile environments , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[45]  Birger Kollmeier,et al.  On the use of spectro-temporal features for the IEEE AASP challenge ‘detection and classification of acoustic scenes and events’ , 2013, 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[46]  Ning Ma,et al.  The PASCAL CHiME speech separation and recognition challenge , 2013, Comput. Speech Lang..

[47]  B. Kollmeier,et al.  Challenge on Detection and Classification of Acoustic Scenes and Events ACOUSTIC EVENT DETECTION USING SIGNAL ENHANCEMENT AND SPECTRO-TEMPORAL FEATURE EXTRACTION , 2013 .

[48]  Tuomas Virtanen,et al.  Audio context recognition using audio event histograms , 2010, 2010 18th European Signal Processing Conference.

[49]  Vesa T. Peltonen,et al.  Audio-based context recognition , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[50]  Subhabrata Chakraborti,et al.  Nonparametric Statistical Inference , 2011, International Encyclopedia of Statistical Science.

[51]  Alex Pentland,et al.  Auditory Context Awareness via Wearable Computing , 1998 .

[52]  Mathieu Lagrange,et al.  Characterisation of acoustic scenes using a temporally-constrained shift-invariant model , 2012 .

[53]  Sergey Ioffe,et al.  Probabilistic Linear Discriminant Analysis , 2006, ECCV.

[54]  Anssi Klapuri,et al.  Recognition of Everyday Auditory Scenes: Potentials, Latencies and Cues , 2001 .