An experiment in audio classification from compressed data

Abstrad - In this paper we present an algorithm for automatic classification of sound into speech, instrumental sound/music and silence. The method is based on thresholding of features derived from the modulation envelope of the frequency limited audio signal. Four characteristics are examined for discrimination: the occurrence and duration of energy peaks, rhythmic content and the level of harmonic content The proposed algorithm allows classification directly on MAEG-1 audio bitstreams. The performance of the classifier was evaluated on TRECVID test data The test results are above-average among all TREC participants. The approaches adopted by other research groups participating in TREC are also discussed.

[1]  Junyu Niu,et al.  FDU at TREC 2002: Filtering, Q&A, Web and Video Tasks , 2002, TREC.

[2]  Marcel Worring,et al.  TREC Feature Extraction by Active Learning , 2002, TREC.

[3]  George Tzanetakis,et al.  Sound analysis using MPEG compressed audio , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[4]  John Saunders,et al.  Real-time discrimination of broadcast speech/music , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[5]  Moncef Gabbouj,et al.  Unsupervised Segmentation and Classification over MP3 and AAC Audio Bitstreams , 2003 .

[6]  Lie Lu,et al.  MSR-Asia at TREC-11 Video Track , 2002 .

[7]  Noel E. O'Connor,et al.  Speech-music discrimination from MPEG-1 bitstream , 2001 .

[8]  Haim H. Permuter,et al.  IBM Research TREC 2002 Video Retrieval System , 2002, TREC.

[9]  Lie Lu,et al.  Content analysis for audio classification and segmentation , 2002, IEEE Trans. Speech Audio Process..

[10]  N. O'Connor,et al.  Rhythm detection for speech-music discrimination in MPEG compressed domain , 2002, 2002 14th International Conference on Digital Signal Processing Proceedings. DSP 2002 (Cat. No.02TH8628).

[11]  Timo Ojala,et al.  TREC 2002 Video Track Experiments at MediaTeam Oulu and VTT , 2002, TREC.

[12]  Yang Lu,et al.  A fast audio classification from MPEG coded data , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[13]  Georges Quénot,et al.  CLIPS at TREC 11: Experiments in Video Retrieval , 2002, TREC.