Many applications such as hands-free videoconferencing, speech processing in large rooms, and acoustic echo cancellation, use microphone arrays to track speaker locations in real-time. A speaker is a wideband source which may be in the near eld or far eld of the array. Current source localization approaches based on neural networks can meet realtime constraints but assume fareld narrowband sources. In this paper, we (1) apply neural networks for determining direction-of-arrival for neareld and fareld wideband speaker localization, and (2) compute the instantaneous cross-power spectra between adjacent pairs of sensors to form the feature vector. We optimized the overall speaker localization system o -line to yield an absolute error of less than 6 degrees at an SNR of 10 dB and a sampling rate of 8000 Hz at each sensor. When performing speaker localization in real-time, the system would require 1 MFLOP/s.
[1]
Brigitte Colnet,et al.
Bearing estimation with time-delay neural networks
,
1995,
1995 International Conference on Acoustics, Speech, and Signal Processing.
[2]
Simon Haykin,et al.
Neural networks
,
1994
.
[3]
Alex C. Kot,et al.
DOA estimation of speech source with microphone arrays
,
1998,
ISCAS '98. Proceedings of the 1998 IEEE International Symposium on Circuits and Systems (Cat. No.98CH36187).
[4]
F.A. Sakarya,et al.
Application of neural networks to bearing estimation
,
1996,
Proceedings of Third International Conference on Electronics, Circuits, and Systems.