论文信息 - Localising speech, footsteps and other sounds using resource-constrained devices

Localising speech, footsteps and other sounds using resource-constrained devices

While a number of acoustic localisation systems have been proposed over the last few decades, these have typically either relied on expensive dedicated microphone arrays and workstation-class processing, or have been developed to detect a very specific type of sound in a particular scenario. However, as people live and work indoors, they generate a wide variety of sounds as they interact and move about. These human-generated sounds can be used to infer the positions of people, without requiring them to wear trackable tags. In this paper, we take a practical yet general approach to localising a number of human-generated sounds. Drawing from signal processing literature, we identify methods for resource-constrained devices in a sensor network to detect, classify and locate acoustic events such as speech, footsteps and objects being placed onto tables. We evaluate the classification and time-of-arrival estimation algorithms using a data set of human-generated sounds we captured with sensor nodes in a controlled setting. We show that despite the variety and complexity of the sounds, their localisation is feasible for sensor networks, with typical accuracies of a half metre or better. We specifically discuss the processing and networking considerations, and explore the performance trade-offs which can be made to further conserve resources.

Yukang Guo | Mike Hazas

[1] Sharon Gannot,et al. Time difference of arrival estimation of speech source in a noisy and reverberant environment , 2005, Signal Process..

[2] T. Ajdler,et al. Acoustic source localization in distributed sensor networks , 2004, Conference Record of the Thirty-Eighth Asilomar Conference on Signals, Systems and Computers, 2004..

[3] Peter Kabal,et al. Speech/music discrimination for multimedia applications , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[4] Bruce H. Krogh,et al. Lightweight detection and classification for wireless sensor networks in realistic environments , 2005, SenSys '05.

[5] Jacob Benesty,et al. Time Delay Estimation in Room Acoustic Environments: An Overview , 2006, EURASIP J. Adv. Signal Process..

[6] M S Brandstein. Time-delay estimation of reverberated speech exploiting harmonic structure. , 1999, The Journal of the Acoustical Society of America.

[7] Stan Z. Li,et al. Content-based audio classification and retrieval using the nearest feature line method , 2000, IEEE Trans. Speech Audio Process..

[8] F. A. Seiler,et al. Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[9] G. Carter,et al. The generalized correlation method for estimation of time delay , 1976 .

[10] Jeroen Breebaart,et al. Features for Audio Classification , 2004 .

[11] Jhing-Fa Wang,et al. Chip design of MFCC extraction for speech recognition , 2002, Integr..

[12] Richard S. Goldhor,et al. Recognition of environmental sounds , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13] Trevor Hastie,et al. The Elements of Statistical Learning , 2001 .

[14] Mohan S. Kankanhalli,et al. Audio Based Event Detection for Multimedia Surveillance , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[15] R. E. Hudson,et al. Acoustic sensor networks for woodpecker localization , 2005, SPIE Optics + Photonics.

[16] Kung Yao,et al. An Empirical Study of Collaborative Acoustic Source Localization , 2007, 2007 6th International Symposium on Information Processing in Sensor Networks.

[17] Wei Pan,et al. SoundSense: scalable sound sensing for people-centric applications on mobile phones , 2009, MobiSys '09.

[18] W. Press,et al. Numerical Recipes in Fortran: The Art of Scientific Computing.@@@Numerical Recipes in C: The Art of Scientific Computing. , 1994 .

[19] James Scott,et al. Audio Location: Accurate Low-Cost Location Sensing , 2005, Pervasive.

[20] Sadaoki Furui,et al. Advances in Speech Signal Processing , 1991 .

[21] Tapio Lokki,et al. AN EYES-FREE USER INTERFACE CONTROLLED BY FINGER SNAPS , 2005 .

[22] R. J. Martin,et al. Autoregressive modelling in vector spaces: An application to narrow-bandwidth spectral estimation , 1996, Signal Process..

[23] Sanjay Jha,et al. The design and evaluation of a hybrid sensor network for Cane-Toad monitoring , 2005 .

[24] Jonathan G. Fiscus,et al. Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[25] John Saunders,et al. Real-time discrimination of broadcast speech/music , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[26] Jacob Benesty,et al. Performance of GCC- and AMDF-Based Time-Delay Estimation in Practical Reverberant Environments , 2005, EURASIP J. Adv. Signal Process..

[27] Deborah Estrin,et al. The design and implementation of a self-calibrating distributed acoustic sensing platform , 2006, SenSys '06.

[28] Chloé Clavel,et al. Events Detection for an Audio-Based Surveillance System , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[29] Lie Lu,et al. Content analysis for audio classification and segmentation , 2002, IEEE Trans. Speech Audio Process..

[30] Malcolm Slaney,et al. Construction and evaluation of a robust multifeature speech/music discriminator , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[31] James M. Rehg,et al. Using Sound Source Localization in a Home Environment , 2005, Pervasive.

[32] Roberto Cusani,et al. Performance of fast time delay estimators , 1989, IEEE Trans. Acoust. Speech Signal Process..

[33] Benoît Champagne,et al. Performance of time-delay estimation in the presence of room reverberation , 1996, IEEE Trans. Speech Audio Process..

[34] Michael S. Brandstein,et al. A practical methodology for speech source localization with microphone arrays , 1997, Comput. Speech Lang..

[35] Janto Skowronek,et al. Automatic surveillance of the acoustic activity in our living environment , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[36] Augusto Sarti,et al. Scream and gunshot detection and localization for audio-surveillance systems , 2007, 2007 IEEE Conference on Advanced Video and Signal Based Surveillance.