Automatic Recognition of Urban Environmental Sound Events

Computer audition is an evolving and relatively new research field with many new applications. It would be of great convenience to live in an environment that can change automatically based on its “auditory sense”. In this work we propose a novel framework for automatic recognition of urban soundscenes. Our system facilitates a hierarchical classification schema while the performance of two well known feature sets is compared. A new postprocessing algorithm to enhance the discrimination quality of MPEG-7 features is proposed and shown to provide improved results. Our approach is examined utilizing a compact testing procedure while MPEG-7 LLDs reach higher recognition rates than MFCCs.

[1]  Thomas Sikora,et al.  MPEG-7 Audio and Beyond: Audio Content Indexing and Retrieval , 2005 .

[2]  Peter Kabal,et al.  Frame level noise classification in mobile environments , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[3]  Jhing-Fa Wang,et al.  Environmental Sound Classification using Hybrid SVM/KNN Classifier and MPEG-7 Audio Low-Level Descriptor , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[4]  M. Casey,et al.  MPEG-7 sound-recognition tools , 2001, IEEE Trans. Circuits Syst. Video Technol..

[5]  Vesa T. Peltonen,et al.  Audio-based context recognition , 2006, IEEE Transactions on Audio, Speech, and Language Processing.