Spatial acoustic cues for the auditory perception of speaker's facing direction

In pursuit of an ultimately realistic human-to-human telecommunication technology, the ability to auditorily perceive the facing direction of a human speaker was explored. Listeners’ performance was assessed in an anechoic chamber. A male speaker sat on a pivot chair and spoke a short sentence while facing a direction that was randomly chosen from eight azimuthal angles or three elevation angles. Twelve blindfolded listeners heard the spoken sentence at a distance of either 1.2 or 2.4 m from the speaker and were asked to indicate the speaker’s facing direction. In separate sessions, the speaker continuously changed facing angles while speaking and the listeners indicated the perceived direction of horizontal rotation (clockwise or counter-clockwise) or vertical rotation (upward or downward). The overall results showed that the listeners’ average response errors were 23.5 degrees for azimuth and 12.9 degrees for elevation. These values were comparable to or better than those obtained in previous studies using a loudspeaker. The average correctresponse rates for rotation direction (either horizontal or vertical) were equal to or more than 80%. To identify acoustic cues that have caused the listeners’ accurate performance, the acoustic transfer characteristics from the speaker’s mouth to the listener’s ears were measured by the cross-spectral method. Finer transfer functions were further obtained in a couple of conditions of particular interest by numerical computer simulation using the finite difference time-domain method. The results suggested that major cues included but were not limited to the overall level and spectral tilt for the front-back or up-down judgment, and the interaural level difference for the left-right judgment.

[1]  Harvey F. Silverman,et al.  A Robust Method to Extract Talker Azimuth Orientation Using a Large-Aperture Microphone Array , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Shigeru Katagiri,et al.  ATR Japanese speech database as a tool of speech recognition and synthesis , 1990, Speech Commun..

[3]  G. A. Miller,et al.  Sensitivity to Changes in the Intensity of White Noise and Its Relation to Masking and Loudness , 1947 .

[4]  A. Mills On the minimum audible angle , 1958 .

[5]  J. Blauert Spatial Hearing: The Psychophysics of Human Sound Localization , 1983 .

[6]  Seiichi Nakagawa,et al.  Automatic estimation of position and orientation of an acoustic source by a microphone array network. , 2009, The Journal of the Acoustical Society of America.

[7]  F L Wightman,et al.  Headphone simulation of free-field listening. II: Psychophysical validation. , 1989, The Journal of the Acoustical Society of America.

[8]  John G. Neuhoff,et al.  Twist and Shout: Audible Facing Angles and Dynamic Rotation , 2003 .

[9]  Montse Pardàs,et al.  Audiovisual Head Orientation Estimation with Particle Filtering in Multisensor Scenarios , 2008, EURASIP J. Adv. Signal Process..

[10]  Daniel E. Shub,et al.  Discrimination and identification of azimuth using spectral shape. , 2008, The Journal of the Acoustical Society of America.

[11]  J. Hebrank,et al.  Spectral cues used in the localization of sound sources on the median plane. , 1974, The Journal of the Acoustical Society of America.

[12]  J. C. Middlebrooks,et al.  Two-dimensional sound localization by human listeners. , 1990, The Journal of the Acoustical Society of America.

[13]  J. Neuhoff,et al.  The audible facing angle , 2001 .

[14]  G VON BEKESY The moon illusion and similar auditory phenomena. , 1949, The American journal of psychology.

[15]  Jerry V. Tobias,et al.  Interaural Intensity Difference Limen , 1967 .

[16]  J. C. Middlebrooks Narrow-band sound localization related to external ear acoustics. , 1992, The Journal of the Acoustical Society of America.

[17]  W. Yost,et al.  Discrimination of interaural differences of level as a function of frequency. , 1988, The Journal of the Acoustical Society of America.

[18]  G A Studebaker Directivity of the Human Vocal Source in the Horizontal Plane , 1985, Ear and hearing.

[19]  D. M. Green,et al.  Sound localization by human listeners. , 1991, Annual review of psychology.

[20]  Parham Mokhtari,et al.  Comparison of Simulated and Measured HRTFs: FDTD Simulation Using MRI Head Data , 2007 .

[21]  D. W. Farnsworth,et al.  Exploration of Pressure Field Around the Human Head During Speech , 1938 .

[22]  Hideki Kawahara,et al.  Transfer characteristics of speech sounds around speaker's head , 2003 .