On-line sound event detection and recognition based on adaptive background model for robot audition

It's a natural and convenient way for a robot to interact with outside by robot's ears (i.e. microphones) based on correctly detection and recognition of a sound event. This paper considers sound event detection and recognition in indoor environment where there are varying noises around a robot. To handle the problem of varying background noises, a novel sound event detection and recognition system is developed. Background model update and re-estimation methods are respectively proposed to handle the situations when background noises change slightly or completely. Recognition is then conducted based on the detected sound event by matching it with the noise-corrupted models generated by our proposed combining method modified Parallel Model Combination method (mPMC). mPMC allows modeling the background noise by Gaussian Mixture Model (GMM) of multiple components and can represent the background noise more precisely compared to Single Gaussian Model (SGM). Experimental results show that our adaptive background modeling method attains excellent detection performance in noise-varying conditions and the recognition performance of our proposed mPMC using GMM also outperforms the conventional PMC using SGM in real-world environment with noise varying.

[1]  Chloé Clavel,et al.  Events Detection for an Audio-Based Surveillance System , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[2]  Roger K. Moore,et al.  Hidden Markov model decomposition of speech and noise , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[3]  Mark J. F. Gales,et al.  Robust continuous speech recognition using parallel model combination , 1996, IEEE Trans. Speech Audio Process..

[4]  Wonyong Sung,et al.  A statistical model-based voice activity detection , 1999, IEEE Signal Processing Letters.

[5]  Nikos Fakotakis,et al.  Probabilistic Novelty Detection for Acoustic Surveillance Under Real-World Conditions , 2011, IEEE Transactions on Multimedia.

[6]  Jhing-Fa Wang,et al.  Robust Environmental Sound Recognition for Home Automation , 2008, IEEE Transactions on Automation Science and Engineering.

[7]  Michel Vacher,et al.  Sound detection and classification through transient models usingwavelet coefficient trees , 2004, 2004 12th European Signal Processing Conference.

[8]  Shrikanth Narayanan,et al.  Environmental Sound Recognition With Time–Frequency Audio Features , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Augusto Sarti,et al.  Scream and gunshot detection and localization for audio-surveillance systems , 2007, 2007 IEEE Conference on Advanced Video and Signal Based Surveillance.

[10]  Mark J. F. Gales,et al.  Robust speech recognition in additive and convolutional noise using parallel model combination , 1995, Comput. Speech Lang..

[11]  Ning Liu,et al.  Bathroom Activity Monitoring Based on Sound , 2005, Pervasive.

[12]  John H. L. Hansen,et al.  Feature compensation in the cepstral domain employing model combination , 2009, Speech Commun..

[13]  Fausto Pellandini,et al.  Automatic sound detection and recognition for noisy environment , 2000, 2000 10th European Signal Processing Conference.

[14]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..