Evaluation of generalized cross-correlation methods for direction of arrival estimation using two microphones in real environments

Abstract The localization of sound sources, and particularly speech, has a numerous number of applications to the industry. This has motivated a continuous effort in developing robust direction-of-arrival detection algorithms, in order to overcome the limitations imposed by real scenarios, such as multiple reflections and undesirable noise sources. Time difference of arrival-based methods, and particularly, generalized cross-correlation approaches have been widely investigated in acoustic signal processing, but there is considerable lack in the technical literature about their evaluation in real environments when only two microphones are used. In this work, four generalized cross-correlation methods for localization of speech sources with two microphones have been analyzed in different real scenarios with a stationary noise source. Furthermore, these scenarios have been acoustically characterized, in order to relate the behavior of these cross-correlation methods with the acoustic properties of noisy scenarios. The scope of this study is not only to assess the accuracy and reliability of a set of well-known localization algorithms, but also to determine how the different acoustic properties of the room under analysis have a determinant influence in the final results, by incorporating in the analysis additional factors to the reverberation time and signal-to-noise ratio. Results of this study have outlined the influence of the acoustic properties analysed in the performance of these methods.

[1]  Murray Hodgson When is diffuse-field theory applicable? , 1996 .

[2]  G. Carter,et al.  The generalized correlation method for estimation of time delay , 1976 .

[3]  Ying Yu,et al.  An improved TDOA-based location estimation algorithm for large aperture microphone arrays , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Maurizio Omologo,et al.  Use of the crosspower-spectrum phase in acoustic event location , 1997, IEEE Trans. Speech Audio Process..

[5]  M. Omologo,et al.  Comparison Between Different Sound Source Localization Techniques Based on a Real Data Collection , 2008, 2008 Hands-Free Speech Communication and Microphone Arrays.

[6]  R. K. Cook,et al.  Measurement of Correlation Coefficients in Reverberant Sound Fields , 1955 .

[7]  Maximo Cobos,et al.  Real time speaker localization and detection system for camera steering in multiparticipant videoconferencing environments , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Steven van de Par,et al.  A Probabilistic Model for Robust Localization Based on a Binaural Auditory Front-End , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Benesty,et al.  Adaptive eigenvalue decomposition algorithm for passive acoustic source localization , 2000, The Journal of the Acoustical Society of America.

[10]  Jorge Dias,et al.  Implementation and calibration of a Bayesian binaural system for 3D localisation , 2009, 2008 IEEE International Conference on Robotics and Biomimetics.

[11]  Maximo Cobos,et al.  Two-microphone multi-speaker localization based on a Laplacian Mixture Model , 2011, Digit. Signal Process..

[12]  Peter R. Roth,et al.  Effective measurements using digital signal analysis , 1971, IEEE Spectrum.

[13]  Jacob Benesty,et al.  Time Delay Estimation in Room Acoustic Environments: An Overview , 2006, EURASIP J. Adv. Signal Process..

[14]  Michael Vorländer,et al.  Handbook of signal processing in acoustics , 2008 .

[15]  Maximo Cobos,et al.  A Modified SRP-PHAT Functional for Robust Real-Time Sound Source Localization With Scalable Spatial Sampling , 2011, IEEE Signal Processing Letters.

[16]  Ronald Morrissey,et al.  Passive acoustic detection and localization of sperm whales (Physeter macrocephalus) in the tongue of the ocean , 2006 .

[17]  M. Schroeder New Method of Measuring Reverberation Time , 1965 .

[18]  Jean Rouat,et al.  Robust sound source localization using a microphone array on a mobile robot , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[19]  Angelo Farina,et al.  Simultaneous Measurement of Impulse Response and Distortion with a Swept-Sine Technique , 2000 .

[20]  Maximo Cobos,et al.  Two-microphone separation of speech mixtures based on interclass variance maximization. , 2010, The Journal of the Acoustical Society of America.

[21]  Parham Aarabi,et al.  Enhanced sound localization , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[22]  Gordon Cheng,et al.  Real-time acoustic source localization in noisy environments for human-robot multimodal interaction , 2007, RO-MAN 2007 - The 16th IEEE International Symposium on Robot and Human Interactive Communication.

[23]  Mohan M. Trivedi,et al.  Source localization in reverberant environments: modeling and statistical analysis , 2003, IEEE Trans. Speech Audio Process..

[24]  Bhaskar D. Rao,et al.  A Two Microphone-Based Approach for Source Localization of Multiple Speech Sources , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[25]  Hervé Glotin,et al.  Real-time 3D tracking of whales by echo-robust precise TDOA estimates with a widely-spaced hydrophone array , 2006 .

[26]  Jont B. Allen,et al.  Image method for efficiently simulating small‐room acoustics , 1976 .

[27]  C. Faller,et al.  Source localization in complex listening situations: selection of binaural cues based on interaural coherence. , 2004, The Journal of the Acoustical Society of America.

[28]  John Vanderkooy,et al.  Transfer-Function Measurement with Maximum-Length Sequences , 1989 .

[29]  Heinrich Kuttruff,et al.  Room acoustics , 1973 .

[30]  Hong-Goo Kang,et al.  A robust time difference of arrival estimator in reverberant environments , 2009, 2009 17th European Signal Processing Conference.

[31]  Finn Jacobsen,et al.  Beamforming with a circular microphone array for localization of environmental noise sources. , 2010, The Journal of the Acoustical Society of America.

[32]  Yong Rui,et al.  Time delay estimation in the presence of correlated noise and reverberation , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[33]  Emmanuel Vincent,et al.  First Stereo Audio Source Separation Evaluation Campaign: Data, Algorithms and Results , 2007, ICA.

[34]  E. J. Hannan,et al.  Estimating group delay , 1973 .

[35]  Douglas C. Montgomery,et al.  Applied Statistics and Probability for Engineers, Third edition , 1994 .

[36]  Hong Wang,et al.  Voice source localization for automatic camera pointing system in videoconferencing , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[37]  L. H. Koopmans The spectral analysis of time series , 1974 .

[38]  Hyogon Kim,et al.  Speaker localization using the TDOA-based feature matrix for a humanoid robot , 2008, RO-MAN 2008 - The 17th IEEE International Symposium on Robot and Human Interactive Communication.

[39]  Ivan Markovic,et al.  Speaker localization and tracking with a microphone array on a mobile robot using von Mises distribution and particle filtering , 2010, Robotics Auton. Syst..