A hierarchical system for audio classification and retrieval based on audio content analysis is presented in this paper. The system consists of three stages. The first stage is called the coarse-level audio classification and segmentation, where audio recordings are classified and segmented into speech, music, several types of environmental sounds, and silence, based on morphological and statistical analysis of temporal curves of short-time features of audio signals. In the second stage, environmental sounds are further classified into finer classes such as applause, rain, bird sound, etc. This fine-level classification is based on time-frequency analysis of audio signals and use of the hidden Markov model (HMM) for classification. In the third stage, the query-by-example audio retrieval is implemented where similar sounds can be found according to an input sample audio. It is shown that the proposed system has achieved an accuracy higher than 90% for coarse-level audio classification. Examples of audio fine classification and audio retrieval are also provided.
[1]
Douglas Keislar,et al.
Content-Based Classification, Search, and Retrieval of Audio
,
1996,
IEEE Multim..
[2]
Jonathan Foote,et al.
Content-based retrieval of music and audio
,
1997,
Other Conferences.
[3]
C.-C. Jay Kuo,et al.
Content-based classification and retrieval of audio
,
1998,
Optics & Photonics.
[4]
Biing-Hwang Juang,et al.
Fundamentals of speech recognition
,
1993,
Prentice Hall signal processing series.
[5]
Stephen W. Smoliar,et al.
Toward content-based audio indexing and retrieval and a new speaker discrimination technique
,
1995,
IJCAI 1995.
[6]
Tsuhan Chen,et al.
Audio feature extraction and analysis for scene classification
,
1997,
Proceedings of First Signal Processing Society Workshop on Multimedia Signal Processing.
[7]
C.-C. Jay Kuo,et al.
Hierarchical system for content-based audio classification and retrieval
,
1998,
Other Conferences.