论文信息 - A proposal for the description of audio in the context of MPEG-7

A proposal for the description of audio in the context of MPEG-7

Sound content description is one of the aims of the MPEG-7initiative. Although MPEG-7 focuses on indexing and retrieval of audio, there are other sound content-based processing applications waiting to be developed once we have a robust set of descriptors and structures for putting them into relation, and for expressing semantic concerns about sound. Spectral Modeling techniques provide one usable framework for extracting and organizing sound content descriptions. In this paper we will introduce one particular approach to spectral modeling, then we will present some sound descriptors that can be derived from them in order to develop sound descriptions, and we will discuss the features of a structure for organizing the information that can be derived from them (a so called "Description Scheme"). All of our current descriptors can be considered low- or mid-level, thus we will not cover the high level description of music (musical forms and styles, roles of characters in a movie, etc.) which is also relevant in MPEG-7 indeed. The descriptors proposed are the result of a sound analysis based on a spectral modeling technique, and for all of them we have devised automatic extraction procedures. The Description Scheme we present is intended to be a generic one that, based on a hierarchical (and recursive in some places) structure, can describe sound at multiple levels of detail, addressing both syntactic (structural) and semantic (content) ways for describing sound.

Xavier Serra | Perfecto Herrera | Geoffroy Peeters

[1] Xavier Serra,et al. A system for sound analysis/transformation/synthesis based on a deterministic plus stochastic decomposition , 1989 .

[2] Brian Christopher Smith,et al. Query by humming: musical information retrieval in an audio database , 1995, MULTIMEDIA '95.

[3] C.-C. Jay Kuo,et al. Content-based classification and retrieval of audio , 1998, Optics & Photonics.

[4] Adam Taro Lindsay,et al. Using contour as a mid-level representation of melody , 1996 .

[5] Jonathan Foote,et al. A Similarity Measure for Automatic Audio Classification , 1997 .

[6] R. Jackendoff,et al. A Generative Theory of Tonal Music , 1985 .

[7] Michael A. Casey,et al. Auditory group theory with applications to statistical basis methods for structured audio , 1998 .

[8] J C Brown. Computer identification of musical instruments using pattern recognition with cepstral coefficients as features. , 1999, The Journal of the Acoustical Society of America.

[9] Neil Gershenfeld,et al. MIT-Media Lab , 1991, ICMC.

[10] Douglas Keislar,et al. Content-Based Classification, Search, and Retrieval of Audio , 1996, IEEE Multim..

[11] Eric D. Scheirer,et al. Towards music understanding without separation: segmenting music with correlogram comodulation , 1999, Proceedings of the 1999 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. WASPAA'99 (Cat. No.99TH8452).