Combining SRP-PHAT and two Kinects for 3D Sound Source Localization

The Kinect(TM) has been developed to recognize gestures and voice commands, through a set of cameras and microphones, respectively. This paper proposes and evaluates low-cost Sound Source Localization (SSL) solution based this off-the-shelf equipment. It consists of employing a pair of Kinect devices as an alternative for microphone array, and executing the Steered Response Power using the PHAse Transform (SRP-PHAT) localization algorithm over acquired sound data. A fully functional prototype has been implemented and put to test under a realistic scenario. Experimental results indicate that although our approach is capable of achieving limited position estimation, and it can accurately point towards the source's direction. Two different high performance versions of the algorithm have been implemented to improve overall system performance under 3D Sound Source Localization setup.

[1]  Mohamed Chetouani,et al.  Multimodal People Engagement with iCub , 2012, BICA.

[2]  Bowon Lee A Vectorized Method for Computationally Efficient SRP-PHAT Sound Source Localization , .

[3]  Maximo Cobos,et al.  A Modified SRP-PHAT Functional for Robust Real-Time Sound Source Localization With Scalable Spatial Sampling , 2011, IEEE Signal Processing Letters.

[4]  Hong Wang,et al.  Voice source localization for automatic camera pointing system in videoconferencing , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  Julius O. Smith,et al.  Closed-form least-squares source location estimation from range-difference measurements , 1987, IEEE Trans. Acoust. Speech Signal Process..

[6]  Joseph H. DiBiase A High-Accuracy, Low-Latency Technique for Talker Localization in Reverberant Environments Using Microphone Arrays , 2000 .

[7]  Ali Pourmohammad,et al.  Real Time High Accuracy 3-D PHAT-Based Sound Source Localization Using a Simple 4-Microphone Arrangement , 2012, IEEE Systems Journal.

[8]  Zhengyou Zhang,et al.  Why does PHAT work well in lownoise, reverberative environments? , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  Carl Eckart Optimal Rectifier Systems for the Detection of Steady Signals , 1952 .

[10]  Hiroshi G. Okuno,et al.  Introduction to Open Source Robot Audition Software HARK , 2011 .

[11]  日向 俊二 Kinect for Windowsアプリを作ろう , 2012 .

[12]  Vicente P. Minotto,et al.  A GPU Implementation of the SRP-PHAT Sound Source Localization Algorithm , 2010 .

[13]  G. C. Carter,et al.  The smoothed coherence transform , 1973 .

[14]  Michael S. Brandstein,et al.  A closed-form location estimator for use with room environment microphone arrays , 1997, IEEE Trans. Speech Audio Process..

[15]  Hendrik Tómasson Speaker Localization and Identification , 2012 .

[16]  Michael S. Brandstein,et al.  A robust method for speech signal time-delay estimation in reverberant rooms , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[17]  JongSuk Choi,et al.  TDOA map adaptation in sound source localization , 2011, ETFA2011.

[18]  Fillia Makedon,et al.  Multi-modal Person Localization And Emergency Detection Using The Kinect , 2013 .

[19]  B. V. Hamon,et al.  Spectral Estimation of Time Delay for Dispersive and Non‐Dispersive Systems , 1974 .

[20]  Yong Rui,et al.  Time delay estimation in the presence of correlated noise and reverberation , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[21]  Peter R. Roth,et al.  Effective measurements using digital signal analysis , 1971, IEEE Spectrum.

[22]  Benesty Adaptive eigenvalue decomposition algorithm for passive acoustic source localization , 2000, The Journal of the Acoustical Society of America.

[23]  Zhengyou Zhang,et al.  Maximum Likelihood Sound Source Localization for Multiple Directional Microphones , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[24]  Cláudio Rosito Jung,et al.  GPU-based approaches for real-time sound source localization using the SRP-PHAT algorithm , 2013, Int. J. High Perform. Comput. Appl..

[25]  G. Carter,et al.  The generalized correlation method for estimation of time delay , 1976 .

[26]  Harvey F. Silverman,et al.  A Fast Microphone Array SRP-PHAT Source Location Implementation using Coarse-To-Fine Region Contraction(CFRC) , 2007, 2007 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[27]  Harvey F. Silverman,et al.  Stochastic particle filtering: A fast SRP-PHAT single source localization algorithm , 2009, 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[28]  Jana Abhijit Kinect for Windows SDK Programming Guide , 2012 .

[29]  Ying Yu,et al.  A Real-Time SRP-PHAT Source Location Implementation using Stochastic Region Contraction(SRC) on a Large-Aperture Microphone Array , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.