Home environmental sound recognition based on MPEG-7 features

In this paper, an environmental sound recognition system based on MPEG-7 audio LLDs (low-level descriptors) is proposed. Traditional sound recognizer utilizes decision-tree based method and causes a problem where the parameter is not generalized. The HMM based sound recognizer has been introduced to resolve this drawback. However, it adopts spectrum parameter and will result in high dimensional feature vectors. This paper successfully solves the shortcoming by taking the basis extraction. The recognition rate is about 82% while only spectrogram is adopted as the parameter. The improved recognition rate is about 95% while three mentioned MPEG-7 audio LLDs are regarded as the parameters in our environmental sound recognizer. These three MPEG-7 audio LLDs are audio spectrum centroid descriptor, audio spectrum spread descriptor and audio spectrum flatness descriptor

[1]  Douglas Keislar,et al.  Content-Based Classification, Search, and Retrieval of Audio , 1996, IEEE Multim..

[2]  M. Casey,et al.  MPEG-7 sound-recognition tools , 2001, IEEE Trans. Circuits Syst. Video Technol..

[3]  Richard S. Goldhor,et al.  Recognition of environmental sounds , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Matthew Brand Structure and parameter learning via entropy minimization, with applications to mixture and hidden Markov models , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[5]  C.-C. Jay Kuo,et al.  Audio content analysis for online audiovisual data segmentation and classification , 2001, IEEE Trans. Speech Audio Process..

[6]  Guojun Lu,et al.  A technique towards automatic audio classification and retrieval , 1998, ICSP '98. 1998 Fourth International Conference on Signal Processing (Cat. No.98TH8344).

[7]  C.-C. Jay Kuo,et al.  Classification and retrieval of sound effects in audiovisual data management , 1999, Conference Record of the Thirty-Third Asilomar Conference on Signals, Systems, and Computers (Cat. No.CH37020).

[8]  George Tzanetakis,et al.  Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[9]  Matthew Brand,et al.  Structure Learning in Conditional Probability Models via an Entropic Prior and Parameter Extinction , 1999, Neural Computation.

[10]  C.-C. Jay Kuo,et al.  Hierarchical classification of audio data for archiving and retrieving , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).