A Neural Network Based Framework for Audio Scene Analysis in Audio Sensor Networks

In recent years, the audio sensor networks have been paid much attention. One of the most important applications of audio sensor networks is audio scene analysis. In this paper, we present a neural network based framework for analyzing the audio scene in the audio sensor networks. In the proposed framework, basic audio events are modeled and detected by Hidden Markov Models (HMMs) in the audio sensor nodes. The cluster head collects the sensory information in its cluster, and then a neural network based approach is proposed to discover the high-level semantic content of the audio context. With the neural network based approach, human knowledge and machine learning are effectively combined together in the semantic inference. That is, the model parameters are learned by statistical learning and then modified manually based on the prior knowledge. We deploy the proposed framework on an audio sensor network and do a series of experiments to evaluate its performance. The experimental results show that our method can work well in the complex real-world situations.

[1]  Manuele Bicego,et al.  On-line adaptive background modelling for audio surveillance , 2004, ICPR 2004.

[2]  Mohan S. Kankanhalli,et al.  Creating audio keywords for event detection in soccer video , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[3]  Milind R. Naphade,et al.  Extracting semantics from audio-visual content: the final frontier in multimedia retrieval , 2002, IEEE Trans. Neural Networks.

[4]  Qi Li,et al.  GBED: group based event detection method for audio sensor networks , 2009, MM '09.

[5]  Vesa T. Peltonen,et al.  Computational auditory scene recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  Regunathan Radhakrishnan,et al.  Audio events detection based highlights extraction from baseball, golf and soccer games in a unified framework , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[7]  Wen-Huang Cheng,et al.  Semantic context detection based on hierarchical audio models , 2003, MIR '03.

[8]  Wen Gao,et al.  Rate-distortion analysis for H.264/AVC video coding and its application to rate control , 2005, IEEE Transactions on Circuits and Systems for Video Technology.

[9]  Chong-Wah Ngo,et al.  Video summarization and scene detection by graph modeling , 2005, IEEE Transactions on Circuits and Systems for Video Technology.

[10]  Denis L. Baggi,et al.  NINA - Navigating and Interacting with Notation and Audio , 2007 .

[11]  Lie Lu,et al.  Audio Elements Based Auditory Scene Segmentation , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[12]  Lie Lu,et al.  Highlight sound effects detection in audio stream , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[13]  Svetha Venkatesh,et al.  Detecting indexical signs in film audio for scene interpretation , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[14]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[15]  Erkki Oja,et al.  A class of neural networks for independent component analysis , 1997, IEEE Trans. Neural Networks.

[16]  Antonio Fernández-Caballero,et al.  Holonic Multi-agent Systems to Integrate Independent Multi-sensor Platforms in Complex Surveillance , 2006, 2006 IEEE International Conference on Video and Signal Based Surveillance.

[17]  Chloé Clavel,et al.  Events Detection for an Audio-Based Surveillance System , 2005, 2005 IEEE International Conference on Multimedia and Expo.