Disambiguating Sound through Context

A central problem in automatic sound recognition is the mapping between low-level audio features and the meaningful content of an auditory scene. We propose a dynamic network model to perform this mapping. In acoustics, much research is devoted to low-level perceptual abilities such as audio feature extraction and grouping, which are translated into successful signal processing techniques. However, little work is done on modeling knowledge and context in sound recognition, although this information is necessary to identify a sound event rather than to separate its components from a scene. We first investigate the role of context in human sound identification in a simple experiment. Then we show that the use of knowledge in a dynamic network model can improve automatic sound identification by reducing the search space of the low-level audio features. Furthermore, context information dissolves ambiguities that arise from multiple interpretations of one sound event.

[1]  James L. McClelland,et al.  An interactive activation model of context effects in letter perception: I. An account of basic findings. , 1981 .

[2]  François Pachet,et al.  Automatic Recognition of Urban Sound Sources , 2006 .

[3]  H. Chertkow,et al.  Semantic memory , 2002, Current neurology and neuroscience reports.

[4]  Zachary M. Smith,et al.  Chimaeric sounds reveal dichotomies in auditory perception , 2002, Nature.

[5]  Guy J. Brown,et al.  A blackboard architecture for computational auditory scene analysis , 1999, Speech Commun..

[6]  Volker Hohmann,et al.  Combined Estimation of Spectral Envelopes and Sound Source Direction of Concurrent Voices by Multidimensional Statistical Filtering , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Paul R. Cohen,et al.  Information retrieval by constrained spreading activation in semantic networks , 1987, Inf. Process. Manag..

[8]  Brian Gygi,et al.  Similarity and categorization of environmental sounds , 2007, Perception & psychophysics.

[9]  François Pachet,et al.  Exploring Billions of Audio Features , 2007, 2007 International Workshop on Content-Based Multimedia Indexing.

[10]  Maria E. Niessen,et al.  Real-world sound recognition : A recipe , 2006 .

[11]  William W. Gaver What in the World Do We Hear? An Ecological Approach to Auditory Event Perception , 1993 .

[12]  J. H. Howard,et al.  Interpreting the Language of Environmental Sounds , 1987 .

[13]  Brian Gygi,et al.  General functions and specific applications of environmental sound research. , 2007, Frontiers in bioscience : a journal and virtual library.

[14]  Leendert van Maanen Mediating expert knowledge and visitor interest in art work recommendation , 2007, LWA.

[15]  Daniel P. W. Ellis,et al.  Using knowledge to organize sound: The prediction-driven approach to computational auditory scene analysis and its application to speech/nonspeech mixtures , 1999, Speech Commun..

[16]  Daniel P. W. Ellis,et al.  The auditory organization of speech and other sources in listeners and computational models , 2001, Speech Commun..

[17]  R V Shannon,et al.  Speech Recognition with Primarily Temporal Cues , 1995, Science.

[18]  Guy J. Brown,et al.  Computational auditory scene analysis , 1994, Comput. Speech Lang..

[19]  W. Yost Auditory image perception and analysis: The basis for hearing , 1991, Hearing Research.

[20]  Brian Gygi,et al.  Effect of context on identification of environmental sounds , 2006 .

[21]  Ching Y. Suen,et al.  Automatic reading of cursive scripts using a reading model and perceptual concepts , 1998, International Journal on Document Analysis and Recognition.

[22]  Odette Scharenborg,et al.  Reaching over the gap: A review of efforts to link human and automatic speech recognition research , 2007, Speech Commun..

[23]  Effects of Context on the Identification of Everyday Sounds , 1991 .

[24]  Daniel P. W. Ellis,et al.  Decoding speech in the presence of other sources , 2005, Speech Commun..

[25]  Stephen Grossberg,et al.  ARTSTREAM: a neural network model of auditory scene analysis and source segregation , 2004, Neural Networks.

[26]  Fabio Crestani,et al.  Application of Spreading Activation Techniques in Information Retrieval , 1997, Artificial Intelligence Review.

[27]  Renate Sitte,et al.  Comparison of techniques for environmental sound recognition , 2003, Pattern Recognit. Lett..

[28]  J. Ballas Common factors in the identification of an assortment of brief everyday sounds , 1993 .

[29]  Daniel Patrick Whittlesey Ellis,et al.  Prediction-driven computational auditory scene analysis , 1996 .

[30]  Hedderik van Rijn,et al.  Personal Publication Assistant: Abstract recommendations by a cognitive model , 2010, Cognitive Systems Research.

[31]  Maria E. Niessen,et al.  Robust harmonic complex estimation in noise. , 2007 .