Speakers' direction finding using estimated time delays in the frequency domain

Speaker localization is an important issue in the study of human communication, and is related to a variety of practical applications. When two or more speakers speak simultaneously, finding the direction of arrival of the speech signals is a complicated task. The spectral separation between different speech signals was first quantified. Some 40%, in the mean sense, of the spectral information for the 0-5 kHz band were found to differ significantly (by at least 10 dB) between any two speakers, even when they speak the same utterance at the same time and with the same intensity. Signals in the frequency domain were analyzed to transform the problem into a set of single-source single-frequency problems. This made it possible to apply a time delay direction finding (TDDF) algorithm (Berdugo et al., J. Acoust. Soc. Am. 105 (6) (1999) 3355). Next, a new "fusion" algorithm was developed which extended the solution to separate the speech signals of two speakers at low SNR values. The results obtained in simulations as well as in actual experimental studies, demonstrated high angular resolution between two speakers (approximately 20° for a 10 cm array extent) even at low SNR ratios. This algorithm may be suitable for various applications, such as video conferencing and hearing aids.

[1]  A. Piersol Time delay estimation using phase data , 1981 .

[2]  N. Ohnishi,et al.  A biomimetic system for localization and separation of multiple sound sources , 1994, Conference Proceedings. 10th Anniversary. IMTC/94. Advanced Technologies in I & M. 1994 IEEE Instrumentation and Measurement Technolgy Conference (Cat. No.94CH3424-9).

[3]  Peter M. Schultheiss,et al.  Optimum Passive Bearing Estimation in a Spatially Incoherent Noise Environment , 1969 .

[4]  J. Mullennix,et al.  Talker Variability in Speech Processing , 1997 .

[5]  Hong Wang,et al.  Voice source localization for automatic camera pointing system in videoconferencing , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  G. E. Peterson,et al.  Control Methods Used in a Study of the Vowels , 1951 .

[7]  Hong-Seok Kim,et al.  Using a real-time, tracking microphone array as input to an HMM speech recognizer , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[8]  Mostafa Kaveh,et al.  The statistical performance of the MUSIC and the minimum-norm algorithms in resolving plane waves in noise , 1986, IEEE Trans. Acoust. Speech Signal Process..

[9]  Jie Huang,et al.  Spatial localization of sound sources: azimuth and elevation estimation , 1998, IMTC/98 Conference Proceedings. IEEE Instrumentation and Measurement Technology Conference. Where Instrumentation is Going (Cat. No.98CH36222).

[10]  Douglas E. Sturim,et al.  Tracking multiple talkers using microphone-array measurements , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  H Kiukaanniemi,et al.  Individual differences in the long-term speech spectrum. , 1982, Folia phoniatrica.

[12]  S Sideman,et al.  A targeting-and-extracting technique to enhance hearing in the presence of competing speech. , 1997, The Journal of the Acoustical Society of America.

[13]  Richard J. Renomeron,et al.  Small‐scale matched filter array processing for spatially selective sound capture , 1997 .

[14]  Miriam A. Doron,et al.  On direction finding of an emitting source from time delays , 1999 .