A proposal for the description of audio in the context of MPEG-7

Sound content description is one of the aims of the MPEG-7initiative. Although MPEG-7 focuses on indexing and retrieval of audio, there are other sound content-based processing applications waiting to be developed once we have a robust set of descriptors and structures for putting them into relation, and for expressing semantic concerns about sound. Spectral Modeling techniques provide one usable framework for extracting and organizing sound content descriptions. In this paper we will introduce one particular approach to spectral modeling, then we will present some sound descriptors that can be derived from them in order to develop sound descriptions, and we will discuss the features of a structure for organizing the information that can be derived from them (a so called "Description Scheme"). All of our current descriptors can be considered low- or mid-level, thus we will not cover the high level description of music (musical forms and styles, roles of characters in a movie, etc.) which is also relevant in MPEG-7 indeed. The descriptors proposed are the result of a sound analysis based on a spectral modeling technique, and for all of them we have devised automatic extraction procedures. The Description Scheme we present is intended to be a generic one that, based on a hierarchical (and recursive in some places) structure, can describe sound at multiple levels of detail, addressing both syntactic (structural) and semantic (content) ways for describing sound.

[1]  Xavier Serra,et al.  A system for sound analysis/transformation/synthesis based on a deterministic plus stochastic decomposition , 1989 .

[2]  Brian Christopher Smith,et al.  Query by humming: musical information retrieval in an audio database , 1995, MULTIMEDIA '95.

[3]  C.-C. Jay Kuo,et al.  Content-based classification and retrieval of audio , 1998, Optics & Photonics.

[4]  Adam Taro Lindsay,et al.  Using contour as a mid-level representation of melody , 1996 .

[5]  Jonathan Foote,et al.  A Similarity Measure for Automatic Audio Classification , 1997 .

[6]  R. Jackendoff,et al.  A Generative Theory of Tonal Music , 1985 .

[7]  Michael A. Casey,et al.  Auditory group theory with applications to statistical basis methods for structured audio , 1998 .

[8]  J C Brown Computer identification of musical instruments using pattern recognition with cepstral coefficients as features. , 1999, The Journal of the Acoustical Society of America.

[9]  Neil Gershenfeld,et al.  MIT-Media Lab , 1991, ICMC.

[10]  Douglas Keislar,et al.  Content-Based Classification, Search, and Retrieval of Audio , 1996, IEEE Multim..

[11]  Eric D. Scheirer,et al.  Towards music understanding without separation: segmenting music with correlogram comodulation , 1999, Proceedings of the 1999 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. WASPAA'99 (Cat. No.99TH8452).

[12]  J. Grey Multidimensional perceptual scaling of musical timbres. , 1977, The Journal of the Acoustical Society of America.

[13]  E. Rosch,et al.  Cognition and Categorization , 1980 .

[14]  S. Handel,et al.  Chapter 12 – Timbre Perception and Auditory Object Identification , 1995 .

[15]  Xavier Serra,et al.  Sound transformations based on the SMS high level attributes , 1998 .

[16]  Ian H. Witten,et al.  Towards the digital music library: tune retrieval from acoustic input , 1996, DL '96.

[17]  Stephen W. Smoliar,et al.  Toward content-based audio indexing and retrieval and a new speaker discrimination technique , 1995, IJCAI 1995.

[18]  李幼升,et al.  Ph , 1989 .

[19]  Edward J. Coyle,et al.  Perceptual Issues in Music Pattern Recognition: Complexity of Rhythm and Key Finding , 2001, Comput. Humanit..

[20]  Jordi Bonada,et al.  Vibrato Extraction and Parameterization in the Spectral Modeling Synthesis framework , 1998 .

[21]  Pierre Schaeffer Traité des objets musicaux , 1966 .

[22]  Keith Dana Martin,et al.  Sound-source recognition: a theory and computational model , 1999 .

[23]  Youngmoo E. Kim,et al.  Musical instrument identification: A pattern‐recognition approach , 1998 .

[24]  Wolfgang Effelsberg,et al.  Automatic audio content analysis , 1997, MULTIMEDIA '96.

[25]  Xavier Rodet,et al.  Signal Characterization in terms of Sinusoidal and Non-Sinusoidal Components , 1998 .

[26]  Eleanor Rosch,et al.  Principles of Categorization , 1978 .

[27]  Xavier Rodet,et al.  New Applications of the Sound Description Interchange Format , 1998, ICMC.

[28]  E. Narmour The Analysis and Cognition of Melodic Complexity: The Implication-Realization Model , 1992 .

[29]  Thom Blum,et al.  Audio databases with content-based retrieval , 1997 .

[30]  Keith D. Martin,et al.  TOWARD AUTOMATIC SOUND SOURCE RECOGNITION: IDENTIFYING MUSICAL INSTRUMENTS , 1998 .