Robust Source Localization and Enhancement With a Probabilistic Steered Response Power Model

Source localization and enhancement are often treated separately in the array processing literature. One can apply steered response power (SRP) localization to determine the sources' Directions-Of-Arrival (DOA) followed by beamforming and Wiener post-filtering to isolate the sources from each other and ambient interference. We show that when there is significant overlap between directional sources of interest in the time-frequency (TF) plane, traditional SRP localization breaks down. This may occur, for example, when the array is located near a reflector, significant early reflections are present, or the sources are harmonized. We propose a joint solution to the localization and enhancement problems via a probabilistic interpretation of the SRP function. We formulate optimization procedures for (1) a mixture of single-source SRP distributions (MoSRP) and (2) a multi-source SRP distribution (MultSRP). Unlike in traditional localization, the latter approach explicitly models source overlap in the TF plane. Results shows that the MultSRP model is capable of localizing sources with significant overlap in the TF domain and that either of the proposed methods out-performs standard SRP localization for multiple speakers.

[1]  Jean Rouat,et al.  Robust localization and tracking of simultaneous moving sound sources using beamforming and particle filtering , 2007, Robotics Auton. Syst..

[2]  James R. Hopgood,et al.  Time-frequency masking based multiple acoustic sources tracking applying Rao-Blackwellised Monte Carlo data association , 2009, 2009 IEEE/SP 15th Workshop on Statistical Signal Processing.

[3]  R. Kumaresan,et al.  Estimating the Angles of Arrival of Multiple Plane Waves , 1983, IEEE Transactions on Aerospace and Electronic Systems.

[4]  Jacob Benesty,et al.  Direction of Arrival Estimation Using the Parameterized Spatial Correlation Matrix , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Zhengyou Zhang,et al.  Maximum Likelihood Sound Source Localization and Beamforming for Directional Microphone Arrays in Distributed Meetings , 2008, IEEE Transactions on Multimedia.

[6]  Kevin H. Knuth,et al.  Bayesian source separation and localization , 1998, Optics & Photonics.

[7]  G. Carter,et al.  The generalized correlation method for estimation of time delay , 1976 .

[8]  Muhammad Salman Khan,et al.  Multimodal (audio-visual) source separation exploiting multi-speaker tracking, robust beamforming and time-frequency masking , 2012, IET Signal Process..

[9]  Mahmood R. Azimi-Sadjadi,et al.  Wideband DOA estimation algorithms for multiple target detection and tracking using unattended acoustic sensors , 2004, SPIE Defense + Commercial Sensing.

[10]  Ying Yu,et al.  A Real-Time SRP-PHAT Source Location Implementation using Stochastic Region Contraction(SRC) on a Large-Aperture Microphone Array , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[11]  Ivan Tashev,et al.  Sound Capture and Processing , 2009 .

[12]  S. Haykin,et al.  Adaptive Filter Theory , 1986 .

[13]  Kaare Brandt Petersen,et al.  The Matrix Cookbook , 2006 .

[14]  Sofiène Affes,et al.  Wideband multi-source beamforming with adaptive array location calibration and direction finding , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[15]  Stanley T. Birchfield,et al.  Acoustic source direction by hemisphere sampling , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[16]  Wolfgang Herbordt Sound Capture for Human / Machine Interfaces: Practical Aspects of Microphone Array Signal Processing (Lecture Notes in Control and Information Sciences) , 2005 .

[17]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[18]  Yannick Mahieux,et al.  Analysis of noise reduction and dereverberation techniques based on microphone arrays with postfiltering , 1998, IEEE Trans. Speech Audio Process..

[19]  Stanley T. Birchfield,et al.  Fast Bayesian acoustic localization , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[20]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[21]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[22]  Ivan Tashev,et al.  Sound Capture and Processing: Practical Approaches , 2009 .

[23]  Thomas Kailath,et al.  ESPRIT-estimation of signal parameters via rotational invariance techniques , 1989, IEEE Trans. Acoust. Speech Signal Process..

[24]  Rémi Gribonval,et al.  Under-Determined Reverberant Audio Source Separation Using a Full-Rank Spatial Covariance Model , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[25]  DeLiang Wang,et al.  On the optimality of ideal binary time-frequency masks , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[26]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[27]  Boaz Rafaely,et al.  Microphone Array Signal Processing , 2008 .

[28]  R. O. Schmidt,et al.  Multiple emitter location and signal Parameter estimation , 1986 .

[29]  Özgür Yilmaz,et al.  On the approximate W-disjoint orthogonality of speech , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[30]  Sergios Theodoridis,et al.  A Novel Efficient Cluster-Based MLSE Equalizer for Satellite Communication Channels with-QAM Signaling , 2006, EURASIP J. Adv. Signal Process..

[31]  Simon Haykin,et al.  Adaptive filter theory (2nd ed.) , 1991 .

[32]  M.L. Seltzer Bridging the Gap: Towards a Unified Framework for Hands-Free Speech Recognition Using Microphone Arrays , 2008, 2008 Hands-Free Speech Communication and Microphone Arrays.

[33]  Zhenyang Wu,et al.  A fast search method of steered response power with small-aperture microphone array for sound source localization , 2013 .

[34]  Mark Hasegawa-Johnson,et al.  Robust Speech Recognition in a Car Using a Microphone Array , 2006 .

[35]  Sharon Gannot,et al.  Microphone Array Speaker Localizers Using Spatial-Temporal Information , 2006, EURASIP J. Adv. Signal Process..

[36]  Hiroshi Sawada,et al.  Underdetermined blind sparse source separation for arbitrarily arranged multiple sensors , 2007, Signal Process..

[37]  L. J. Griffiths,et al.  An alternative approach to linearly constrained adaptive beamforming , 1982 .

[38]  J. Cardoso,et al.  Blind beamforming for non-gaussian signals , 1993 .

[39]  Mati Wax,et al.  Joint estimation of time delays and directions of arrival of multiple reflections of a known signal , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[40]  Jeffrey L. Krolik,et al.  Relationships between adaptive minimum variance beamforming and optimal source localization , 2000, IEEE Trans. Signal Process..

[41]  Scott Rickard,et al.  Blind separation of speech mixtures via time-frequency masking , 2004, IEEE Transactions on Signal Processing.

[42]  Jont B. Allen,et al.  Image method for efficiently simulating small‐room acoustics , 1976 .

[43]  Wolfgang Herbordt Sound capture for human/machine interfaces , 1899 .

[44]  Paris Smaragdis,et al.  Multichannel Source Separation and Tracking With RANSAC and Directional Statistics , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.