Digital Ventriloquism: Giving Voice to Everyday Objects

Smart speakers with voice agents are becoming increasingly common. However, the agent's voice always emanates from the device, even when that information is contextually and spatially relevant elsewhere. Digital Ventriloquism allows smart speakers to render sound onto everyday objects, such that it appears they are speaking and are interactive. This can be achieved without any modification of objects or the environment. For this, we used a highly directional pan-tilt ultrasonic array. By modulating a 40 kHz ultrasonic signal, we can emit sound that is inaudible "in flight" and demodulates to audible frequencies when impacting a surface through acoustic parametric interaction. This makes it appear as though the sound originates from an object and not the speaker. We ran a study in which we projected speech onto five objects in three environments, and found that participants were able to correctly identify the source object 92% of the time and correctly repeat the spoken message 100% of the time, demonstrating our digital ventriloquy is both directional and intelligible.

[1]  Kaj Grønbæk,et al.  Communicating art through interactive technology: new approaches for interaction design in art museums , 2008, NordiCHI.

[2]  Kentaro Ishii,et al.  A Navigation System Using Ultrasonic Directional Speaker with Rotating Base , 2007, HCI.

[3]  Bhiksha Raj,et al.  Ultrasonic Doppler Sensing in HCI , 2012, IEEE Pervasive Computing.

[4]  Yasuaki Kakehi,et al.  SonalShooter: a spatial augmented reality system using handheld directional speaker with camera , 2011, SIGGRAPH '11.

[5]  Mark Weiser,et al.  The computer for the 21st Century , 1991, IEEE Pervasive Computing.

[6]  D. Miller Huygens's wave propagation principle corrected. , 1991, Optics letters.

[7]  Markus H. Gross,et al.  Spatialized audio rendering for immersive virtual environments , 2002, VRST '02.

[8]  Chris Harrison,et al.  BeamBand: Hand Gesture Sensing with Ultrasonic Beamforming , 2019, CHI.

[9]  SubramanianSriram,et al.  Rendering volumetric haptic shapes in mid-air using ultrasound , 2014 .

[10]  Hirotaka Osawa,et al.  Merging Viewpoints of User and Avatar in Automatic Control of Avatar-Mediated Communication , 2013 .

[11]  Toshiyuki Kimura,et al.  Reproduction of sound radiation directivities of musical instruments by a spherical loudspeaker with multiple transducers , 2010, VRCAI '10.

[12]  Antonis A. Argyros,et al.  Design and Development of Four Prototype Interactive Edutainment Exhibits for Museums , 2011, HCI.

[13]  Kaj Grønbæk,et al.  Interactive spatial multimedia for communication of art in the physical museum space , 2008, ACM Multimedia.

[14]  Sriram Subramanian,et al.  Perception of ultrasonic haptic feedback on the hand: localisation and apparent motion , 2014, CHI.

[15]  Dingzeyu Li,et al.  Scene-aware audio for 360° videos , 2018, ACM Trans. Graph..

[16]  F. Joseph Pompei,et al.  Sound from ultrasound : the parametric array as an audible sound source , 2002 .

[17]  Chris Harrison,et al.  Interferi: Gesture Sensing using On-Body Acoustic Interferometry , 2019, CHI.

[18]  Kaj Grønbæk,et al.  Designing Playful Interactive Installations for Urban Environments - The SwingScape Experience , 2012, Advances in Computer Entertainment.

[19]  Bhiksha Raj,et al.  One-handed gesture recognition using ultrasonic Doppler sonar , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[20]  Constantine Stephanidis,et al.  Art and Coffee in the Museum , 2015, HCI.

[21]  Bogdan Kreczmer Gestures recognition by using ultrasonic range-finders , 2011, 2011 16th International Conference on Methods & Models in Automation & Robotics.

[22]  Nigel Papworth,et al.  Using sound to enhance users' experiences of mobile applications , 2012, AM '12.

[23]  Sriram Subramanian,et al.  UltraHaptics: multi-point mid-air haptic feedback for touch surfaces , 2013, UIST.

[24]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Benjamin B. Bederson,et al.  Audio augmented reality: a prototype automated tour guide , 1995, CHI 95 Conference Companion.

[26]  Christopher Peter Lueg,et al.  Natural interactions between augmented virtual objects , 2011, OZCHI.

[27]  Yoichi Ochiai,et al.  Sonoliards: Rendering Audible Sound Spots by Reflecting the Ultrasound Beams , 2017, UIST.

[28]  Sriram Subramanian,et al.  Haptics and Directional Audio Using Acoustic Metasurfaces , 2017, ISS.

[29]  Lucy E. Dunne,et al.  Tactile distance feedback for firefighters: design and preliminary evaluation of a sensory augmentation glove , 2013, AH.

[30]  Yasuaki Kakehi,et al.  SteganoSonic: a locally information overlay system using parametric speakers , 2013, SIGGRAPH '13.

[31]  James H. Aylor,et al.  Computer for the 21st Century , 1999, Computer.

[32]  Sriram Subramanian,et al.  Rendering volumetric haptic shapes in mid-air using ultrasound , 2014, ACM Trans. Graph..