As the multimedia content of the Web increases techniques to automatically classify this content become more important. We present a system to classify audio files collected from the Web. The system classifies any audio file as belonging to one of three categories: speech, music and other. To classify the audio files, we use the technique of Fisher kernels. The technique as proposed by Jaakkola (1998) assumes a probabilistic generative model for the data, in our case a Gaussian mixture model. Then a discriminative classifier uses the GMM as an intermediate step to produce appropriate feature vectors. Support vector machines are our choice of discriminative classifier. We present classification results on a collection of more than 173 hours of Web audio randomly collected. We believe our results represent one of the first realistic studies of audio classification performance on found data. Our final system yielded a classification rate of 81.8%.
[1]
Douglas Keislar,et al.
Content-Based Classification, Search, and Retrieval of Audio
,
1996,
IEEE Multim..
[2]
M. A. Siegler,et al.
Automatic Segmentation, Classification and Clustering of Broadcast News Audio
,
1997
.
[3]
Jonathan Foote,et al.
Content-based retrieval of music and audio
,
1997,
Other Conferences.
[4]
J. C. BurgesChristopher.
A Tutorial on Support Vector Machines for Pattern Recognition
,
1998
.
[5]
S. Sclaroff,et al.
Combining textual and visual cues for content-based image retrieval on the World Wide Web
,
1998,
Proceedings. IEEE Workshop on Content-Based Access of Image and Video Libraries (Cat. No.98EX173).
[6]
David Haussler,et al.
A Discriminative Framework for Detecting Remote Protein Homologies
,
2000,
J. Comput. Biol..