A new algorithm for the estimation of talker azimuthal orientation using a large aperture microphone array

Knowing the orientation of a talker allows a a large-aperture microphone array to select and control cameras better in a teleconferencing situation, improve source-location estimation, and, often, improve beamforming. In 2004, we introduced a baseline algorithm for determining orientation azimuth. Recent testing showed the baseline algorithm behaved poorly when the source was not in the center of the focal area for the array. Here, we describe a second-generation algorithm, A2, that has overcome many of the baselinepsilas short-falls. It still extracts the estimate from microphone energies, but is improved by 1) using a narrow-band, high-frequency analysis, rather than the broad band of the baseline algorithm, 2) using spectral subtraction for uncorrelated noise removal and 3) fitting the processed microphone energies to an ideal model for the direct-wave energy. Most important is that 3) incorporates inverse-square-law effects properly on the direct wave only, which was not the case in the baseline. Results from an advanced simulator are presented to illustrate the issues. Then, A2 and baseline algorithm results are compared using about 60 direct recordings from a human talker in a typical and noisy environment using our 448-microphone array. These show that A2 is a significant improvement.

[1]  James L. Flanagan,et al.  The huge microphone array , 1998, IEEE Concurr..

[2]  Norbert Krüger,et al.  Determination of face position and pose with a learned representation based on labelled graphs , 1997, Image Vis. Comput..

[3]  James L. Flanagan,et al.  A digital processing system for source location and sound capture by large microphone arrays , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Harvey F. Silverman,et al.  A baseline algorithm for estimating talker orientation using acoustical data from a large-aperture microphone array , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  Martin Kompis,et al.  Simulating transfer functions in a reverberant room including source directivity and head-shadow effects , 1993 .

[6]  J. Flanagan,et al.  The Huge Microphone Array (HMA) , 1997 .

[7]  Thomas S. Huang,et al.  Head Pose Computation for Very Low Bit-rate Video Coding , 1995, CAIP.

[8]  Michael S. Brandstein,et al.  A hybrid real-time face tracking system , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[9]  C. Malsburg,et al.  Determination of Face Position and Pose with a Learned Representation Based on Labeled Graphs Determination of Face Position and Pose with a Learned Representation Based on Labeled Graphs , 1996 .

[10]  J. Flanagan,et al.  Computer‐steered microphone arrays for sound transduction in large rooms , 1985 .

[11]  Steven F. Boll A spectral subtraction algorithm for suppression of acoustic noise in speech , 1979, ICASSP.