Bringing the Scene Back to the Tele-operator: Auditory Scene Manipulation for Tele-presence Systems

In a tele-operated robot system, the reproduction of auditory scenes, conveying 3D spatial information of sound sources in the remote robot environment, is important for the transmission of remote presence to the tele-operator. We proposed a tele-presence system which is able to reproduce and manipulate the auditory scenes of a remote robot environment, based on the spatial information of human voices around the robot, matched with the operator’s head orientation. In the robot side, voice sources are localized and separated by using multiple microphone arrays and human tracking technologies, while in the operator side, the operator’s head movement is tracked and used to relocate the spatial positions of the separated sources. Interaction experiments with humans in the robot environment indicated that the proposed system had significantly higher accuracy rates for perceived direction of sounds, and higher subjective scores for sense of presence and listenability, compared to a baseline system using stereo binaural sounds obtained by two microphones located at the humanoid robot’s ears. We also proposed three different user interfaces for augmented auditory scene control. Evaluation results indicated higher subjective scores for sense of presence and usability in two of the interfaces (control of voice amplitudes based on virtual robot positioning, and amplification of voices in the frontal direction).

[1]  Attentional shifts by gaze direction in voluntary orienting: evidence from a microsaccade study , 2012, Experimental Brain Research.

[2]  Hiroshi Ishiguro,et al.  Head motions during dialogue speech and nod timing control in humanoid robots , 2010, HRI 2010.

[3]  Hiroshi Ishiguro,et al.  Generation of nodding, head tilting and eye gazing for human-robot dialogue interaction , 2012, 2012 7th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[4]  Andrea Turolla,et al.  Exercises for paretic upper limb after stroke: a combined virtual-reality and telemedicine approach. , 2009, Journal of rehabilitation medicine.

[5]  Gregory H. Wakefield,et al.  Introduction to Head-Related Transfer Functions (HRTFs): Representations of HRTFs in Time, Frequency, and Space , 2001 .

[6]  Norihiro Hagita,et al.  Using multiple microphone arrays and reflections for 3D localization of sound sources , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[7]  Yukio Iwaya,et al.  Effects of head movement on front-back error in sound localization , 2003 .

[8]  R. O. Schmidt,et al.  Multiple emitter location and signal Parameter estimation , 1986 .

[9]  D.E. Dudgeon,et al.  Fundamentals of digital array processing , 1977, Proceedings of the IEEE.

[10]  Hiroshi Ishiguro,et al.  Evaluation of a MUSIC-based real-time sound localization of multiple sound sources in real noisy environments , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[11]  Sascha Disch,et al.  Reproducing Applause-Type Signals with Directional Audio Coding , 2011 .

[12]  Michitaka Hirose,et al.  Immersive telecommunication using stereo video avatar , 2001, Proceedings IEEE Virtual Reality 2001.

[13]  Satoshi Nakamura,et al.  Localization of multiple sound sources based on a CSP analysis with a microphone array , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[14]  Erwin Meyer,et al.  Physical and applied acoustics : an introduction , 1972 .

[15]  Ville Myllyla,et al.  Acoustic Echo Cancellation for Dynamically Steered Microphone Array Systems , 2007, 2007 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[16]  H. Ishiguro,et al.  CAN A TELEOPERATED ANDROID REPRESENT PERSONAL PRESENCE? - A CASE STUDY WITH CHILDREN , 2007 .

[17]  S. Perrett,et al.  The effect of head rotations on vertical plane sound localization. , 1997, The Journal of the Acoustical Society of America.

[18]  E. Meyer 2 – Room Acoustics , 1972 .

[19]  André Gilloire,et al.  Microphone array for sound pickup in teleconference systems , 1994 .

[20]  Hiroshi Ishiguro,et al.  Laser tracking of human body motion using adaptive shape modeling , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[21]  V. Bruce,et al.  Do the eyes have it? Cues to the direction of social attention , 2000, Trends in Cognitive Sciences.

[22]  Hirokazu Kato,et al.  Real World Teleconferencing , 2002, IEEE Computer Graphics and Applications.

[23]  Grigore C. Burdea,et al.  A virtual-reality-based telerehabilitation system with force feedback , 2000, IEEE Transactions on Information Technology in Biomedicine.

[24]  Takashi Minato,et al.  Minimal Human Design Approach for sonzai-kan Media: Investigation of a Feeling of Human Presence , 2014, Cognitive Computation.

[25]  W. G. Gardner,et al.  HRTF measurements of a KEMAR , 1995 .

[26]  Ville Pulkki,et al.  Spatial Sound Reproduction with Directional Audio Coding , 2007 .

[27]  H. Bullinger Immersive Projection Technology- Benefits for the Industry , 1997 .