Semi-Supervised Sound Source Localization Based on Manifold Regularization

Conventional speaker localization algorithms, based merely on the received microphone signals, are often sensitive to adverse conditions, such as: high reverberation or low signal-to-noise ratio (SNR). In some scenarios, e.g., in meeting rooms or cars, it can be assumed that the source position is confined to a predefined area, and the acoustic parameters of the environment are approximately fixed. Such scenarios give rise to the assumption that the acoustic samples from the region of interest have a distinct geometrical structure. In this paper, we show that the high-dimensional acoustic samples indeed lie on a low-dimensional manifold and can be embedded into a low-dimensional space. Motivated by this result, we propose a semi-supervised source localization algorithm based on two-microphone measurements, which recovers the inverse mapping between the acoustic samples and their corresponding locations. The idea is to use an optimization framework based on manifold regularization, that involves smoothness constraints of possible solutions with respect to the manifold. The proposed algorithm, termed manifold regularization for localization, is adapted while new unlabelled measurements (from unknown source locations) are accumulated during runtime. Experimental results show superior localization performance when compared with a recently presented algorithm based on a manifold learning approach and with the generalized cross-correlation algorithm as a baseline. The algorithm achieves 2° accuracy in typical noisy and reverberant environments (reverberation time between 200 and 800 ms and SNR between 5 and 20 dB).

[1]  Ronald R. Coifman,et al.  Parametrization of Linear Systems Using Diffusion Kernels , 2012, IEEE Transactions on Signal Processing.

[2]  Ronald R. Coifman,et al.  Diffusion Maps for Signal Processing: A Deeper Look at Manifold-Learning Techniques Based on Kernels and Graphs , 2013, IEEE Signal Processing Magazine.

[3]  Stéphane Lafon,et al.  Diffusion maps , 2006 .

[4]  Zhengyou Zhang,et al.  Maximum Likelihood Sound Source Localization for Multiple Directional Microphones , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[5]  Jont B. Allen,et al.  Image method for efficiently simulating small‐room acoustics , 1976 .

[6]  Haizhou Li,et al.  A learning-based approach to direction of arrival estimation in noisy and reverberant environments , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  Matthias Hein,et al.  Intrinsic dimensionality estimation of submanifolds in Rd , 2005, ICML.

[8]  Michael S. Brandstein,et al.  A robust method for speech signal time-delay estimation in reverberant rooms , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Maurizio Omologo,et al.  Use of the crosspower-spectrum phase in acoustic event location , 1997, IEEE Trans. Speech Audio Process..

[10]  Radu Horaud,et al.  Acoustic Space Learning for Sound-Source Separation and Localization on Binaural Manifolds , 2014, Int. J. Neural Syst..

[11]  Jacob Benesty,et al.  Passive acoustic source localization for video camera steering , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[12]  L. Rosasco,et al.  Manifold Regularization , 2007 .

[13]  Ann B. Lee,et al.  Geometric diffusions as a tool for harmonic analysis and structure definition of data: diffusion maps. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Bin Yang,et al.  Disambiguation of TDOA Estimation for Multiple Sources in Reverberant Environments , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[15]  Bernhard Schölkopf,et al.  A Generalized Representer Theorem , 2001, COLT/EuroCOLT.

[16]  A. Berlinet,et al.  Reproducing kernel Hilbert spaces in probability and statistics , 2004 .

[17]  Gerhard Schmidt,et al.  Acoustic echo control. An application of very-high-order adaptive filters , 1999, IEEE Signal Process. Mag..

[18]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[19]  Ehud Weinstein,et al.  Signal enhancement using beamforming and nonstationarity with applications to speech , 2001, IEEE Trans. Signal Process..

[20]  Kevin H. Knuth,et al.  Bayesian source separation and localization , 1998, Optics & Photonics.

[21]  Michael S. Brandstein,et al.  A closed-form location estimator for use with room environment microphone arrays , 1997, IEEE Trans. Speech Audio Process..

[22]  Matthias Hein Intrinsic Dimensionality Estimation of Submanifolds in R , 2005 .

[23]  Jacob Benesty,et al.  Time Delay Estimation in Room Acoustic Environments: An Overview , 2006, EURASIP J. Adv. Signal Process..

[24]  G. Carter,et al.  The generalized correlation method for estimation of time delay , 1976 .

[25]  Marc Moonen,et al.  Robust Adaptive Time Delay Estimation for Speaker Localization in Noisy and Reverberant Acoustic Environments , 2003, EURASIP J. Adv. Signal Process..

[26]  Thomas Kailath,et al.  ESPRIT-estimation of signal parameters via rotational invariance techniques , 1989, IEEE Trans. Acoust. Speech Signal Process..

[27]  HighWire Press Philosophical transactions of the Royal Society of London. Series A, Containing papers of a mathematical or physical character , 1896 .

[28]  Satoshi Nakamura,et al.  Robust speech recognition with speaker localization by a microphone array , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[29]  Sharon Gannot,et al.  Time difference of arrival estimation of speech source in a noisy and reverberant environment , 2005, Signal Process..

[30]  Petre Stoica,et al.  Maximum likelihood methods for direction-of-arrival estimation , 1990, IEEE Trans. Acoust. Speech Signal Process..

[31]  Yong Rui,et al.  Time delay estimation in the presence of correlated noise and reverberation , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[32]  Michael S. Brandstein,et al.  Robust Localization in Reverberant Rooms , 2001, Microphone Arrays.

[33]  Sharon Gannot,et al.  Relative transfer function modeling for supervised source localization , 2013, 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[34]  Mikhail Belkin,et al.  Semi-Supervised Learning on Riemannian Manifolds , 2004, Machine Learning.

[35]  Sharon Gannot,et al.  A Study on Manifolds of Acoustic Responses , 2015, LVA/ICA.

[36]  Jacob Benesty,et al.  Real-time passive source localization: a practical linear-correction least-squares approach , 2001, IEEE Trans. Speech Audio Process..

[37]  J. Mercer Functions of Positive and Negative Type, and their Connection with the Theory of Integral Equations , 1909 .

[38]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[39]  I. Cohen,et al.  Generating nonstationary multisensor signals under a spatial coherence constraint. , 2008, The Journal of the Acoustical Society of America.

[40]  Radu Horaud,et al.  Variational EM for binaural sound-source separation and localization , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[41]  Benoît Champagne,et al.  A new cepstral prefiltering technique for estimating time delay under reverberant conditions , 1997, Signal Process..

[42]  R. O. Schmidt,et al.  Multiple emitter location and signal Parameter estimation , 1986 .

[43]  Kung Yao,et al.  Maximum-likelihood source localization and unknown sensor location estimation for wideband signals in the near-field , 2002, IEEE Trans. Signal Process..

[44]  Benoît Champagne,et al.  Performance of time-delay estimation in the presence of room reverberation , 1996, IEEE Trans. Speech Audio Process..

[45]  Steven van de Par,et al.  A Probabilistic Model for Robust Localization Based on a Binaural Auditory Front-End , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[46]  Israel Cohen,et al.  Supervised source localization using diffusion kernels , 2011, 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[47]  Benesty,et al.  Adaptive eigenvalue decomposition algorithm for passive acoustic source localization , 2000, The Journal of the Acoustical Society of America.

[48]  Radu Horaud,et al.  2D sound-source localization on the binaural manifold , 2012, 2012 IEEE International Workshop on Machine Learning for Signal Processing.