论文信息 - Sound-event recognition with a companion humanoid

Sound-event recognition with a companion humanoid

In this paper we address the problem of recognizing everyday sound events in indoor environments with a consumer robot. Sounds are represented in the spectro-temporal domain using the stabilized auditory image (SAI) representation. The SAI is well suited for representing pulse-resonance sounds and has the interesting property of mapping a time-varying signal into a fixed-dimension feature vector space. This allows us to map the sound recognition problem into a supervised classification problem and to adopt a variety of classifications schemes. We present a complete system that takes as input a continuous signal, splits it into significant isolated sounds and noise, and classifies the isolated sounds using a catalogue of learned sound-event classes. The method is validated with a large set of audio data recorded with a humanoid robot in a house. Extended experiments show that the proposed method achieves state-of-the-art recognition scores with a twelve-class problem, while requiring extremely limited memory space and moderate computing power. A first real-time embedded implementation in a consumer robot show its ability to work in real conditions.

[1] Allen Gersho,et al. Vector quantization and signal compression , 1991, The Kluwer international series in engineering and computer science.

[2] Alex Pakhomov,et al. A novel method for footstep detection with extremely low false alarm rate , 2003, SPIE Defense + Commercial Sensing.

[3] Cordelia Schmid,et al. Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4] Birger Kollmeier,et al. Speech pause detection for noise spectrum estimation by tracking power envelope dynamics , 2002, IEEE Trans. Speech Audio Process..

[5] Roy D. Patterson,et al. Auditory images:How complex sounds are represented in the auditory system , 2000 .

[6] Augusto Sarti,et al. Scream and gunshot detection and localization for audio-surveillance systems , 2007, 2007 IEEE Conference on Advanced Video and Signal Based Surveillance.

[7] Noureddine Ellouze,et al. One-Class SVMs Challenges in Audio Detection and Classification Applications , 2008, EURASIP J. Adv. Signal Process..

[8] Thomas C. Walters. Auditory-based processing of communication sounds , 2011 .

[9] Richard F. Lyon,et al. Machine Hearing: An Emerging Field [Exploratory DSP] , 2010, IEEE Signal Processing Magazine.

[10] Mohammad Hossein Moattar,et al. A simple but efficient real-time Voice Activity Detection algorithm , 2009, 2009 17th European Signal Processing Conference.

[11] Michel Vacher,et al. Life Sounds Extraction and Classification in Noisy Environment , 2003, SIP.

[12] Samy Bengio,et al. Sound Retrieval and Ranking Using Sparse Auditory Representations , 2010, Neural Computation.

[13] Shrikanth S. Narayanan,et al. Robust Voice Activity Detection Using Long-Term Signal Variability , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[14] Dan Istrate,et al. Real time sound analysis for medical remote monitoring , 2008, 2008 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[15] Ronald W. Schafer,et al. Theory and Applications of Digital Speech Processing , 2010 .

[16] D.R. Reddy,et al. Speech recognition by machine: A review , 1976, Proceedings of the IEEE.

[17] V. Ramasubramanian,et al. Continuous audio analytics by HMM and Viterbi decoding , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[18] A. Fleury,et al. Sound and speech detection and classification in a Health Smart Home , 2008, 2008 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[19] Michel Vacher,et al. Sound Detection through Transient Models using Wavelet Coefficient Trees , 2004 .

[20] Richard F. Lyon,et al. Machine Hearing: An Emerging Field , 2010 .

[21] S. Gökhun Tanyer,et al. Voice activity detection in nonstationary noise , 2000, IEEE Trans. Speech Audio Process..

[22] Roy D. Patterson,et al. A FUNCTIONAL MODEL OF NEURAL ACTIVITY PATTERNS AND AUDITORY IMAGES , 2004 .

[23] Chungyong Lee,et al. Robust voice activity detection algorithm for estimating noise spectrum , 2000 .