Sound localization in a median plane using an avatar robot “ TeleHead ” with synchronization of a listener ’ s horizontal head rotation

Demands for realization of communications with a high sense of presence have become great. For such communications, it is important acoustically to capture and transmit comprehensive sound space information of remote places to a local site. The authors developed an avatar robot, which was developed as a simplified version of TeleHead proposed by Toshima et al. The robot’s head moves synchronously, following the listener’s horizontal head rotation. The authors investigated perceptual sound localization accuracy in the median plane at a remote site. Results show that sound localization accuracy is improved when the robot rotates synchronously with the listener’s head rotation. Another sound localization test was conducted to examine the effects of manipulating the ratio of the rotation angle. The ratio between the listener’s actual movement and that of the robot varied systematically. Results show that the ratios only slightly affect the accuracy of perceived elevation angles, suggesting large robustness in using cues provided by head rotation. INTRODUCTION In the near future, advanced and natural communications with a person in remote places might be well realized using a system with a high sense of presence. For such communications, it is important to capture and transmit comprehensive sound space information of a remote place to a local site. Another important point of interactive communications is to reproduce a sound space so that the synthesized sound field becomes responsive to a listener’s movement. Several research results have revealed that the accuracy of sound localization can be improved in a horizontal plane when we allow movement of the head and body in both real [1, 2, 3, 8] and virtual [4, 5, 6, 7] environments. Several methods have shown great potential to realize responsiveness to listeners’ movements in sound reproduction. Wave field synthesis (WFS) [9] and boundary sound control (BoSC) [10] based on Kirchhoff–Helmholtz integral equations permit a listener to change position and execute head movements freely when in the controlled area. Furthermore, Ambisonics technique [11] allows a listener’s head movements at and near the sweet spot of listening. However, these methods require many loudspeakers to control sound fields with high accuracy. In recording, numerous microphones are also needed. Sakamoto proposed a novel sound capturing system, SENZI[12]. A spherical microphone array with many microphones installed is used for capturing three-dimensional sound space information. These signals are adequately converted to binaural signals using simple digital signal processing. Because the spherical microphone array is symmetric, the signal processing can be changed according to head movement that is sensed using a position sensor. An important point related to reproduction of a sound field is to synthesize the sound field so that head-related transfer functions (HRTFs) [13] of a listener are adequately convolved before sounds arrive at the ear drums. In WFS, BoSC, and Ambisonics techniques, HRTFs are naturally convolved at a listening point with the listener’s actual figures. In contrast, measurements or numerical estimations of listener’s HRTFs are needed in the SENZI system because HRTFs closely depend on a listener’s head, body, and ear shapes. However, the measurement of HRTFs for a specific listener requires a huge measurement apparatus, time, and effort [14]. The numerical estimation demands many computation resources[15, 16, 17, 18]. TeleHead [19] in a remote site can move synchronously to the person’s various head movements by sensing head movements via a position sensor attached at the person’s head. The person can listen to sounds via TeleHead as an avatar if the person wears headphones whose inputs are connected to TeleHead’s ears. The head of TeleHead can be exchanged and a listener can use her own personal figure of head as an avatar at the remote site. Because two microphones are installed at ear entrances of the dummy-head at the remote site, her own HRTFs are naturally convolved when the listener uses her own head figure as the dummy-head. Although the listener must prepare such a head figure in advance, the TeleHead appears promising to enable us to sense the whole remote sound space information interactively as an avatar of the listener with the listener’s own HRTFs. Several researchers have reported that head movement during listening in a sound space can enhance the accuracy of sound localization and the reality, or the sense-of-presence, of the perceived sound space in real and virtual environments[1, 2, 3, 4, 7]. Toshima et al. also investigated sound localization accuracy using TeleHead in a horizontal plane and median plane [20]. They used head shapes of two types and discussed the effects of head shapes. They pointed out that the localization accuracy can be improved using TeleHead even when the head shapes of TeleHead are not a listener’s own one. They also reported that synchronization of a listener’s head movements was important. However, in their experiment, TeleHead could

[1]  Iwaki Toshima,et al.  Sound Localization During Head Movement Using an Acoustical Telepresence Robot: TeleHead , 2009, Adv. Robotics.

[2]  Yukio Iwaya,et al.  Effects of head movement on front-back error in sound localization , 2003 .

[3]  Yukio Iwaya,et al.  Comparison of sound localization performance between virtual and real three-dimensional immersive sound field , 2009 .

[4]  Yukio Iwaya,et al.  Estimation of detection threshold of system latency of virtual auditory display , 2007 .

[5]  Makoto Otani,et al.  Fast calculation system specialized for head-related transfer function based on boundary element method. , 2006, The Journal of the Acoustical Society of America.

[6]  Shuichi Sakamoto,et al.  SENZI and ASURA: New High-Precision Sound-Space Sensing Systems based on Symmetrically Arranged Numerous Microphones , 2008, 2008 Second International Symposium on Universal Communication.

[7]  W R Thurlow,et al.  Effect of induced head movements on localization of direction of sounds. , 1967, The Journal of the Acoustical Society of America.

[8]  Mark A. Poletti,et al.  Three-Dimensional Surround Sound Systems Based on Spherical Harmonics , 2005 .

[9]  Yukio Iwaya,et al.  Influence of Large System Latency of Virtual Auditory Display on Behavior of Head Movement in Sound Localization Task , 2008 .

[10]  A. Berkhout,et al.  Acoustic control by wave field synthesis , 1993 .

[11]  Parham Mokhtari,et al.  Comparison of Simulated and Measured HRTFs: FDTD Simulation Using MRI Head Data , 2007 .

[12]  S. Ise A principle of sound field control based on the Kirchhoff-Helmholtz integral equation and the theory of inverse systems , 1999 .

[13]  S. Perrett,et al.  The effect of head rotations on vertical plane sound localization. , 1997, The Journal of the Acoustical Society of America.

[14]  B F Katz,et al.  Boundary element method calculation of individual head-related transfer function. I. Rigid model calculation. , 2001, The Journal of the Acoustical Society of America.

[15]  W R Thurlow,et al.  Head movements during sound localization. , 1967, The Journal of the Acoustical Society of America.