Proposal and validation of an analytical generative model of SRP-PHAT power maps in reverberant scenarios

The algorithms for acoustic source localization based on PHAT filtering have been profusely used with good results in reverberant and noisy environments. However, there are very few studies that give a formal explanation of their robustness, most of them providing just an empirical validation or showing results on simulated data. In this work we present a novel analytical model for predicting the behavior of both the SRP-PHAT power maps and the GCC-PHAT functions. The results show that they are only affected by the signal bandwidth, the microphone array topology, and the room geometry, being independent of the spectral content of the received signal. The proposed model is shown to be valid in reverberant environments and under far and near field conditions. Using this result, an analysis study on how the aforementioned factors affect the SRP-PHAT power maps is presented providing well supported theoretical and practical considerations. The model validation is based on both synthetic and real data, obtaining in all cases a high accuracy of the model to reproduce the SRP-PHAT power maps, both in anechoic and non-anechoic scenarios, becoming thus an excellent tool to be exploited for the improvement of real world relevant applications related to acoustic localization. HighlightsA novel parametric analytical model to predict SRP-PHAT power maps is formulated.An exhaustive evaluation is done on both synthetic and real data.Results show high accuracy for very different acoustical and geometrical conditions.The paper also addresses practical issues in the model implementation.

[1]  Alexander H. Waibel,et al.  Computers in the Human Interaction Loop , 2009, Handbook of Ambient Intelligence and Smart Environments.

[2]  Hervé Bourlard,et al.  Enhanced diffuse field model for ad hoc microphone array calibration , 2014, Signal Process..

[3]  Maurizio Omologo,et al.  Use of the crosspower-spectrum phase in acoustic event location , 1997, IEEE Trans. Speech Audio Process..

[4]  Ying Yu,et al.  Performance of real-time source-location estimators for a large-aperture microphone array , 2005, IEEE Transactions on Speech and Audio Processing.

[5]  Ina Kodrasi,et al.  Microphone position optimization for planar superdirective beamforming , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  G. Carter,et al.  The generalized correlation method for estimation of time delay , 1976 .

[7]  Ramani Duraiswami,et al.  Flexible and Optimal Design of Spherical Microphone Arrays for Beamforming , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  L. Ziomek Fundamentals of Acoustic Field Theory and Space-Time Signal Processing , 1994 .

[9]  Sakari Tervo,et al.  3D room geometry estimation from measured impulse responses , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  Manuel Mazo,et al.  Stereo Vision Tracking of Multiple Objects in Complex Indoor Environments , 2010, Sensors.

[11]  Robert B. Newman,et al.  Collected Papers on Acoustics , 1927 .

[12]  Yunfei Chen,et al.  On secrecy outage of MISO SWIPT systems in the presence of imperfect CSI , 2016, 2016 24th European Signal Processing Conference (EUSIPCO).

[13]  Darren Moore,et al.  The IDIAP Smart Meeting Room , 2002 .

[14]  Jont B. Allen,et al.  Invertibility of a room impulse response , 1979 .

[15]  François Michaud,et al.  Evaluating real-time audio localization algorithms for artificial audition in robotics , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[16]  Parham Aarabi,et al.  Enhanced sound localization , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[17]  Jean-Marc Odobez,et al.  AV16.3: An Audio-Visual Corpus for Speaker Localization and Tracking , 2004, MLMI.

[18]  Jacob Benesty,et al.  A Generalized Steered Response Power Method for Computationally Viable Source Localization , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[19]  Jacob Benesty,et al.  On Spatial Aliasing in Microphone Arrays , 2009, IEEE Transactions on Signal Processing.

[20]  Harvey F. Silverman,et al.  SRP-PHAT methods of locating simultaneous multiple talkers using a frame of microphone array data , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[21]  Mike Brookes,et al.  Room geometry estimation from a single channel acoustic impulse response , 2013, 21st European Signal Processing Conference (EUSIPCO 2013).

[22]  Taras Butko,et al.  Two-source acoustic event detection and localization: Online implementation in a Smart-room , 2011, 2011 19th European Signal Processing Conference.

[23]  Richard Heusdens,et al.  Auto-localization in ad-hoc microphone arrays , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[24]  Jacob Benesty,et al.  Steered Beamforming Approaches for Acoustic Source Localization , 2010 .

[25]  Zhengyou Zhang,et al.  Maximum Likelihood Sound Source Localization and Beamforming for Directional Microphone Arrays in Distributed Meetings , 2008, IEEE Transactions on Multimedia.

[26]  Matti S. Hämäläinen,et al.  A track before detect approach for sequential Bayesian tracking of multiple speech sources , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[27]  Joseph H. DiBiase A High-Accuracy, Low-Latency Technique for Talker Localization in Reverberant Environments Using Microphone Arrays , 2000 .

[28]  Henry G. Dietz,et al.  Performance of phase transform for detecting sound sources with microphone arrays in reverberant and noisy environments , 2007, Signal Process..

[29]  G. C. Carter,et al.  The smoothed coherence transform , 1973 .

[30]  Hoang Tran Huy Do,et al.  Robust cross-correlation-based methods for sound-source localization and separation using a large-aperture microphone array , 2011 .

[31]  Zhengyou Zhang,et al.  Why does PHAT work well in lownoise, reverberative environments? , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[32]  Maximo Cobos,et al.  A Modified SRP-PHAT Functional for Robust Real-Time Sound Source Localization With Scalable Spatial Sampling , 2011, IEEE Signal Processing Letters.

[33]  Harvey F. Silverman,et al.  A Linear Closed-Form Algorithm for Source Localization From Time-Differences of Arrival , 2008, IEEE Signal Processing Letters.

[34]  Harvey F. Silverman,et al.  A Free-Source Method (FrSM) for Calibrating a Large-Aperture Microphone Array , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[35]  Stanley T. Birchfield A unifying framework for acoustic localization , 2004, 2004 12th European Signal Processing Conference.

[36]  Volkan Cevher,et al.  Room Acoustic Modeling Exploiting Joint Sparsity and Low-rank Structures , 2013 .

[37]  Michael S. Brandstein,et al.  A practical methodology for speech source localization with microphone arrays , 1997, Comput. Speech Lang..

[38]  Walter Kellermann,et al.  TRINICON-based Blind System Identification with Application to Multiple-Source Localization and Separation , 2007, Blind Speech Separation.

[39]  Javier Macias-Guarasa,et al.  Source Localization with Acoustic Sensor Arrays Using Generative Model Based Fitting with Sparse Constraints , 2012, Sensors (Basel, Switzerland).

[40]  Christophe d'Alessandro,et al.  Human voice phoneme directivity pattern measurements , 2006 .

[41]  Michael S. Brandstein,et al.  Robust Localization in Reverberant Rooms , 2001, Microphone Arrays.

[42]  Jont B. Allen,et al.  Image method for efficiently simulating small‐room acoustics , 1976 .

[43]  Pascal Fua,et al.  Multicamera People Tracking with a Probabilistic Occupancy Map , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.