Sound Ontology for Computational Auditory Scence Analysis

This paper proposes that sound ontology should be used both as a common vocabulary for sound representation and as a common terminology for integrating various sound stream segregation systems. Since research on computational auditory scene analysis (CASA) focuses on recognizing and understanding various kinds of sounds, sound stream segregation which extracts each sound stream from a mixture of sounds is essential for CASA. Even if sound stream segregation systems use a harmonic structure of sound as a cue of segregation, it is not easy to integrate such systems because the definition of a harmonic structure differs or the precision of extracted harmonic structures differs. Therefore, sound ontology is needed as a common knowledge representation of sounds.Another problem is to interface sound stream segregation systems with applications such as automatic speech recognition systems. Since the requirement of the quality of segregated sound streams depends on applications, sound stream segregation systems must provide a flexible interface. Therefore, sound ontology is needed to fulfill the requirements imposed by them. In addition, the hierarchical structure of sound ontology provides a means of controlling top-down and bottom-up processing of sound stream segregation.

[1]  Tomohiro Nakatani,et al.  Un-derstanding three simultaneous speakers , 1997 .

[2]  Tomohiro Nakatani,et al.  Auditory Stream Segregation in Auditory Scene Analysis with a Multi-Agent System , 1994, AAAI.

[3]  Q. Summerfield Book Review: Auditory Scene Analysis: The Perceptual Organization of Sound , 1992 .

[4]  Mira Balaban,et al.  Understanding music with AI: perspectives on music cognition , 1992 .

[5]  Ramdas Kumaresan,et al.  Voiced-speech analysis based on the residual interfering signal canceler (RISC) algorithm , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Victor R. Lesser,et al.  IPUS: An Architecture for Integrated Signal Processing and Signal Interpretation in Complex Environments , 1993, AAAI.

[7]  P Green,et al.  Computational auditory scene analysis: listening to several things at once. , 1993, Endeavour.

[8]  Carol Y. Espy-Wilson,et al.  Knowledge-based analysis of speech mixed with sporadic environmental sounds , 1998 .

[9]  Guy J. Brown,et al.  A computational model of auditory scene analysis , 1992, ICSLP.

[10]  Sadaoki Furui,et al.  A maximum likelihood procedure for a universal adaptation method based on HMM composition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[11]  Tomohiro Nakatani,et al.  Residue-Driven Architecture for Computational Auditory Scene Analysis , 1995, IJCAI.

[12]  Hiroshi G. Okuno Interfacing Sound Stream Segregation to Speech Recognition Systems-Preliminary Results of Listening to Several Things at the Same Time , 1996 .

[13]  Kunio Kashino,et al.  Organization of Hierarchical Perceptual Sounds: Music Scene Analysis with Autonomous Processing Modules and a Quantitative Information Integration Mechanism , 1995, IJCAI.

[14]  Sadaoki Furui,et al.  Adaptation method based on HMM composition and EM algorithm , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.