Hidden Markov Models (HMMs) permit a natural and flexible way to model time-sequential data. The ease of concatenation and time-warping algorithms implementation on HMM’s suit them very well for segmentation and content based audio classification applications, as is clear from their extended and succesful use on speech recognition applications. Speech has a natural basic unit, the phone, which normally delimits the number of models to one per phone. Moreover, knowledge of the speech structure facilitates the choice of the model parameters. When modeling generic audio, on other hand, the lack of a natural basic unit, and the absence of a clear structure, make the selection and the parameter estimation of an optimal set of HMMs difficult. In this paper we present different approaches to select and estimate the HMM parameters of a set of representative generic audio classes. We compare these approaches in the context of a contentbased classification application using the MuscleFish database. The models are first found through frame clustering or by traditional EM techniques under some specific selection criterias, such as the Bayesian Information Criterion. Further descriminative training of the initial models, considerably improve their perfomance in the content-based classification task, obtaining results comparable with the ones obtained, for the same task, by inherently discriminative classification methods, such as support vector machines, while preserving their intrinsic flexibility.
[1]
C.-C. Jay Kuo,et al.
Content-based classification and retrieval of audio
,
1998,
Optics & Photonics.
[2]
Daniel P. W. Ellis,et al.
Error visualization for tandem acoustic modeling on the Aurora task
,
2002,
2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[3]
Li Deng,et al.
The trended HMM with discriminative training for phonetic classification
,
1996,
Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.
[4]
Guodong Guo,et al.
Content-Based Audio Classification and Retrieval Using SVM Learning
,
2000
.
[5]
G. Schwarz.
Estimating the Dimension of a Model
,
1978
.
[6]
Douglas Keislar,et al.
Content-Based Classification, Search, and Retrieval of Audio
,
1996,
IEEE Multim..
[7]
M. Casey.
Reduced-Rank Spectra and Minimum-Entropy Priors as Consistent and Reliable Cues for Generalized Sound Recognition
,
2001
.