Real-time microphone array processing for sound source separation and localization

In this paper, the problem of sound source separation and localization is studied using a microphone array. A pure delay mixture model which is typical in outdoor environments is adopted. Our proposed approach utilizes the subspace method to estimate the directions of arrival (DOAs) of the sources from the collected mixtures. Since sound signals are generally considered broadband, the DOA estimates for a source at different frequencies are used to approximate the probability density function of the DOA. The maximum likelihood criterion is used to determine the final DOA estimate for the source. Using the estimated DOAs, the corresponding mixing and demixing matrices in the frequency domain are computed, and the source signals are recovered using the inverse short time Fourier transform (STFT). Our algorithm inherits the robustness to noise of the subspace method and also supports real-time implementation. Comprehensive simulations and experiments have been conducted to examine various aspects of the algorithm.

[1]  Hong Wang,et al.  Coherent signal-subspace processing for the detection and estimation of angles of arrival of multiple wide-band sources , 1985, IEEE Trans. Acoust. Speech Signal Process..

[2]  Erkki Oja,et al.  Independent component analysis: algorithms and applications , 2000, Neural Networks.

[3]  Scott Rickard,et al.  Blind separation of speech mixtures via time-frequency masking , 2004, IEEE Transactions on Signal Processing.

[4]  Dirk P. Kroese,et al.  Kernel density estimation via diffusion , 2010, 1011.2602.

[5]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Kazuya Takeda,et al.  Evaluation of blind signal separation method using directivity pattern under reverberant conditions , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[7]  Daniel P. W. Ellis,et al.  Model-Based Expectation-Maximization Source Separation and Localization , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Parham Aarabi,et al.  Self-localizing dynamic microphone arrays , 2002 .

[9]  Guy J. Brown,et al.  Computational Auditory Scene Analysis: Principles, Algorithms, and Applications , 2006 .

[10]  Paris Smaragdis,et al.  Blind separation of convolved mixtures in the frequency domain , 1998, Neurocomputing.

[11]  Thomas Kailath,et al.  ESPRIT-estimation of signal parameters via rotational invariance techniques , 1989, IEEE Trans. Acoust. Speech Signal Process..

[12]  Bhaskar D. Rao,et al.  A Two Microphone-Based Approach for Source Localization of Multiple Speech Sources , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Torsten Söderström,et al.  Statistical analysis of MUSIC and subspace rotation estimates of sinusoidal frequencies , 1991, IEEE Trans. Signal Process..

[14]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[15]  H. Akaike A new look at the statistical model identification , 1974 .

[16]  Yeo-Sun Yoon,et al.  Direction-of-arrival Estimation of Wideband Sources Using Sensor Arrays , 2004 .

[17]  Thomas Kailath,et al.  Detection of signals by information theoretic criteria , 1985, IEEE Trans. Acoust. Speech Signal Process..

[18]  R. O. Schmidt,et al.  Multiple emitter location and signal Parameter estimation , 1986 .

[19]  Jacob Benesty,et al.  On Spatial Aliasing in Microphone Arrays , 2009, IEEE Transactions on Signal Processing.