Auditory augmented reality: Object sonification for the visually impaired

Augmented reality applications have focused on visually integrating virtual objects into real environments. In this paper, we propose an auditory augmented reality, where we integrate acoustic virtual objects into the real world. We sonify objects that do not intrinsically produce sound, with the purpose of revealing additional information about them. Using spatialized (3D) audio synthesis, acoustic virtual objects are placed at specific real-world coordinates, obviating the need to explicitly tell the user where they are. Thus, by leveraging the innate human capacity for 3D sound source localization and source separation, we create an audio natural user interface. In contrast with previous work, we do not create acoustic scenes by transducing low-level (for instance, pixel-based) visual information. Instead, we use computer vision methods to identify high-level features of interest in an RGB-D stream, which are then sonified as virtual objects at their respective real-world coordinates. Since our visual and auditory senses are inherently spatial, this technique naturally maps between these two modalities, creating intuitive representations. We evaluate this concept with a head-mounted device, featuring modes that sonify flat surfaces, navigable paths and human faces.

[1]  C. Avendano,et al.  The CIPIC HRTF database , 2001, Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat. No.01TH8575).

[2]  Pavel Zahorik,et al.  Assessing auditory distance perception using virtual acoustics. , 2002, The Journal of the Acoustical Society of America.

[3]  Paul A. Viola,et al.  Multiple-Instance Pruning For Learning Efficient Cascade Detectors , 2007, NIPS.

[4]  Guido Bologna,et al.  On the use of the auditory pathway to represent image scenes in real-time , 2009, Neurocomputing.

[5]  D. Mershon,et al.  Intensity and reverberation as factors in the auditory perception of egocentric distance , 1975 .

[6]  J. Cronly-Dillon,et al.  The perception of visual images encoded in musical form: a study in cross-modality information transfer , 1999, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[7]  Peter B. L. Meijer,et al.  An experimental system for auditory image representations , 1992, IEEE Transactions on Biomedical Engineering.

[8]  Guido Bologna,et al.  Transforming 3D Coloured Pixels into Musical Instrument Notes for Vision Substitution Applications , 2007, EURASIP J. Image Video Process..

[9]  Reinhard Klein,et al.  Efficient RANSAC for Point‐Cloud Shape Detection , 2007, Comput. Graph. Forum.

[10]  Jian Sun,et al.  Face recognition with learning-based descriptor , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[11]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[12]  Pavel Zahorik,et al.  Direct-to-reverberant energy ratio sensitivity. , 2002, The Journal of the Acoustical Society of America.

[13]  Sazali Yaacob,et al.  Wearable Real-Time Stereo Vision for the Visually Impaired , 2007, Eng. Lett..

[14]  Zhengyou Zhang,et al.  3D Deformable Face Tracking with a Commodity Depth Camera , 2010, ECCV.

[15]  Barbara Shinn-Cunningham LOCALIZING SOUND IN ROOMS , 2001 .