Sound-event recognition with a companion humanoid

In this paper we address the problem of recognizing everyday sound events in indoor environments with a consumer robot. Sounds are represented in the spectro-temporal domain using the stabilized auditory image (SAI) representation. The SAI is well suited for representing pulse-resonance sounds and has the interesting property of mapping a time-varying signal into a fixed-dimension feature vector space. This allows us to map the sound recognition problem into a supervised classification problem and to adopt a variety of classifications schemes. We present a complete system that takes as input a continuous signal, splits it into significant isolated sounds and noise, and classifies the isolated sounds using a catalogue of learned sound-event classes. The method is validated with a large set of audio data recorded with a humanoid robot in a house. Extended experiments show that the proposed method achieves state-of-the-art recognition scores with a twelve-class problem, while requiring extremely limited memory space and moderate computing power. A first real-time embedded implementation in a consumer robot show its ability to work in real conditions.

[1]  Allen Gersho,et al.  Vector quantization and signal compression , 1991, The Kluwer international series in engineering and computer science.

[2]  Alex Pakhomov,et al.  A novel method for footstep detection with extremely low false alarm rate , 2003, SPIE Defense + Commercial Sensing.

[3]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Birger Kollmeier,et al.  Speech pause detection for noise spectrum estimation by tracking power envelope dynamics , 2002, IEEE Trans. Speech Audio Process..

[5]  Roy D. Patterson,et al.  Auditory images:How complex sounds are represented in the auditory system , 2000 .

[6]  Augusto Sarti,et al.  Scream and gunshot detection and localization for audio-surveillance systems , 2007, 2007 IEEE Conference on Advanced Video and Signal Based Surveillance.

[7]  Noureddine Ellouze,et al.  One-Class SVMs Challenges in Audio Detection and Classification Applications , 2008, EURASIP J. Adv. Signal Process..

[8]  Thomas C. Walters Auditory-based processing of communication sounds , 2011 .

[9]  Richard F. Lyon,et al.  Machine Hearing: An Emerging Field [Exploratory DSP] , 2010, IEEE Signal Processing Magazine.

[10]  Mohammad Hossein Moattar,et al.  A simple but efficient real-time Voice Activity Detection algorithm , 2009, 2009 17th European Signal Processing Conference.

[11]  Michel Vacher,et al.  Life Sounds Extraction and Classification in Noisy Environment , 2003, SIP.

[12]  Samy Bengio,et al.  Sound Retrieval and Ranking Using Sparse Auditory Representations , 2010, Neural Computation.

[13]  Shrikanth S. Narayanan,et al.  Robust Voice Activity Detection Using Long-Term Signal Variability , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  Dan Istrate,et al.  Real time sound analysis for medical remote monitoring , 2008, 2008 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[15]  Ronald W. Schafer,et al.  Theory and Applications of Digital Speech Processing , 2010 .

[16]  D.R. Reddy,et al.  Speech recognition by machine: A review , 1976, Proceedings of the IEEE.

[17]  V. Ramasubramanian,et al.  Continuous audio analytics by HMM and Viterbi decoding , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[18]  A. Fleury,et al.  Sound and speech detection and classification in a Health Smart Home , 2008, 2008 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[19]  Michel Vacher,et al.  Sound Detection through Transient Models using Wavelet Coefficient Trees , 2004 .

[20]  Richard F. Lyon,et al.  Machine Hearing: An Emerging Field , 2010 .

[21]  S. Gökhun Tanyer,et al.  Voice activity detection in nonstationary noise , 2000, IEEE Trans. Speech Audio Process..

[22]  Roy D. Patterson,et al.  A FUNCTIONAL MODEL OF NEURAL ACTIVITY PATTERNS AND AUDITORY IMAGES , 2004 .

[23]  Chungyong Lee,et al.  Robust voice activity detection algorithm for estimating noise spectrum , 2000 .