An evaluation framework for event detection using a morphological model of acoustic scenes

This paper introduces a model of environmental acoustic scenes which adopts a morphological approach by ab-stracting temporal structures of acoustic scenes. To demonstrate its potential, this model is employed to evaluate the performance of a large set of acoustic events detection systems. This model allows us to explicitly control key morphological aspects of the acoustic scene and isolate their impact on the performance of the system under evaluation. Thus, more information can be gained on the behavior of evaluated systems, providing guidance for further improvements. The proposed model is validated using submitted systems from the IEEE DCASE Challenge; results indicate that the proposed scheme is able to successfully build datasets useful for evaluating some aspects the performance of event detection systems, more particularly their robustness to new listening conditions and the increasing level of background sounds.

[1]  Daniel Pressnitzer,et al.  Rapid Formation of Robust Auditory Memories: Insights from Noise , 2010, Neuron.

[2]  P. Karsmakers,et al.  AN MFCC-GMM APPROACH FOR EVENT DETECTION AND CLASSIFICATION , 2013 .

[3]  G. Kramer Auditory Scene Analysis: The Perceptual Organization of Sound by Albert Bregman (review) , 2016 .

[4]  Brian Gygi,et al.  The incongruency advantage for environmental sounds presented in natural auditory scenes. , 2011, Journal of experimental psychology. Human perception and performance.

[5]  Rainer Stiefelhagen,et al.  The CLEAR 2006 Evaluation , 2006, CLEAR.

[6]  B. Kollmeier,et al.  Challenge on Detection and Classification of Acoustic Scenes and Events ACOUSTIC EVENT DETECTION USING SIGNAL ENHANCEMENT AND SPECTRO-TEMPORAL FEATURE EXTRACTION , 2013 .

[7]  Valérie Maffiolo De la caractérisation sémantique et acoustique de la qualité sonore de l'environnement urbain : structuration des représentations mentales et influence sur l'appréciation qualitative : application aux ambiances sonores de Paris , 1999 .

[8]  Nicolas Saint-Arnaud Classification of sound textures , 1995 .

[9]  Yoshitaka Nakajima,et al.  Auditory Scene Analysis: The Perceptual Organization of Sound Albert S. Bregman , 1992 .

[10]  Dan Stowell,et al.  Segregating event streams and noise with a Markov renewal process model , 2012, J. Mach. Learn. Res..

[11]  Bart Vanrumste,et al.  An exemplar-based NMF approach to audio event detection , 2013, 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[12]  V. Ciocca The auditory organization of complex sounds. , 2008, Frontiers in bioscience : a journal and virtual library.

[13]  I. Nelken,et al.  Neurons and Objects: The Case of Auditory Cortex , 2008, Front. Neurosci..

[14]  Guillaume Lemaitre,et al.  A lexical analysis of environmental sound categories. , 2012, Journal of experimental psychology. Applied.

[15]  Richard E. Turner,et al.  Modeling Natural Sounds with Modulation Cascade Processes , 2007, NIPS.

[16]  M. Southworth The Sonic Environment of Cities , 1969 .

[17]  Danièle Dubois,et al.  Urban soundscapes: Experiences and knowledge , 2005 .

[18]  George Tzanetakis,et al.  The Orchive : Data mining a massive bioacoustic archive , 2013, ArXiv.

[19]  Maria E. Niessen,et al.  Disambiguating Sound through Context , 2008, Int. J. Semantic Comput..

[20]  Brian Gygi,et al.  Similarity and categorization of environmental sounds , 2007, Perception & psychophysics.

[21]  Mark D. Plumbley,et al.  Large‐scale analysis of frequency modulation in birdsong data bases , 2013, ArXiv.

[22]  Danièle Dubois,et al.  Catégorisation et cognition : de la perception au discours , 1997 .

[23]  R. Carlyon How the brain separates sounds , 2004, Trends in Cognitive Sciences.

[24]  J. Snyder,et al.  Toward a neurophysiological theory of auditory stream segregation. , 2007, Psychological bulletin.

[25]  Dan Stowell,et al.  A database and challenge for acoustic scene classification and event detection , 2013, 21st European Signal Processing Conference (EUSIPCO 2013).

[26]  Richard Kronland-Martinet,et al.  A 3-D Immersive Synthesizer for Environmental Sounds , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[27]  Meinard Müller,et al.  Information retrieval for music and motion , 2007 .

[28]  R. Radhakrishnan,et al.  Audio analysis for surveillance applications , 2005, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005..

[29]  Dan Stowell,et al.  Detection and classification of acoustic scenes and events: An IEEE AASP challenge , 2013, 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[30]  M. Marcell,et al.  Confrontation Naming of Environmental Sounds , 2000, Journal of clinical and experimental neuropsychology.

[31]  Diemo Schwarz,et al.  State of the Art in Sound Texture Synthesis , 2011 .

[32]  D. Dubois,et al.  A cognitive approach to urban soundscapes : Using verbal data to access everyday life auditory categories , 2006 .

[33]  S. S. Culbert,et al.  Cognition and Categorization , 1979 .

[34]  Toni Heittola,et al.  IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events SOUND EVENT DETECTION FOR OFFICE LIVE AND OFFICE SYNTHETIC AASP CHALLENGE , 2015 .

[35]  Perfecto Herrera,et al.  IEEE AASP Challenge on Detection and Classification of Acous tic Scenes and Events AUTOMATIC EVENT CLASSIFICATION USING FRONT END SINGLE CHAN NEL NOISE REDUCTION, MFCC FEATURES AND A SUPPORT VECTOR MACHINE CLASS IFIER , 2013 .

[36]  Jian Kang,et al.  Towards standardization in soundscape preference assessment , 2011 .

[37]  Jon Barker,et al.  The second ‘chime’ speech separation and recognition challenge: Datasets, tasks and baselines , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[38]  Sarah L. Dumyahn,et al.  What is soundscape ecology? An introduction and overview of an emerging new science , 2011, Landscape Ecology.

[39]  J. H. Howard,et al.  Interpreting the Language of Environmental Sounds , 1987 .

[40]  Ning Ma,et al.  The PASCAL CHiME speech separation and recognition challenge , 2013, Comput. Speech Lang..

[41]  Justin Salamon,et al.  Sensing Urban Soundscapes , 2014, EDBT/ICDT Workshops.

[42]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[43]  Eero P. Simoncelli,et al.  Article Sound Texture Perception via Statistics of the Auditory Periphery: Evidence from Sound Synthesis , 2022 .

[44]  Sascha Spors,et al.  Sound Field Synthesis , 2004 .

[45]  Larry S. Davis,et al.  Rendering localized spatial audio in a virtual auditory space , 2004, IEEE Transactions on Multimedia.

[46]  Justin Salamon,et al.  A Dataset and Taxonomy for Urban Sound Research , 2014, ACM Multimedia.

[47]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[48]  C. Guastavino The ideal urban soundscape : Investigating the sound quality of french cities , 2006 .

[49]  Kris Popat,et al.  Analysis and synthesis of sound textures , 1998 .

[50]  Eero P. Simoncelli,et al.  Summary statistics in auditory perception , 2013, Nature Neuroscience.

[51]  Tuomas Virtanen,et al.  Context-dependent sound event detection , 2013, EURASIP Journal on Audio, Speech, and Music Processing.

[52]  William W. Gaver What in the World Do We Hear? An Ecological Approach to Auditory Event Perception , 1993 .

[53]  Danièle Dubois,et al.  CATEGORIES FOR SOUNDSCAPE : TOWARD A HYBRID CLASSIFICATION , 2010 .

[54]  Alain de Cheveigné,et al.  An ear for statistics , 2013, Nature Neuroscience.

[55]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.