Rendering visual events as sounds: Spatial attention capture by auditory augmented reality

Many salient visual events tend to coincide with auditory events, such as seeing and hearing a car pass by. Information from the visual and auditory senses can be used to create a stable percept of the stimulus. Having access to related coincident visual and auditory information can help for spatial tasks such as localization. However not all visual information has analogous auditory percepts, such as viewing a computer monitor. Here, we describe a system capable of detecting and augmenting visual salient events into localizable auditory events. The system uses a neuromorphic camera (DAVIS 240B) to detect logarithmic changes of brightness intensity in the scene, which can be interpreted as salient visual events. Participants were blindfolded and asked to use the device to detect new objects in the scene, as well as determine direction of motion for a moving visual object. Results suggest the system is robust enough to allow for the simple detection of new salient stimuli, as well accurately encoding direction of visual motion. Future successes are probable as neuromorphic devices are likely to become faster and smaller in the future, making this system much more feasible.

[1]  E. DeYoe,et al.  A comparison of visual and auditory motion processing in human cerebral cortex. , 2000, Cerebral cortex.

[2]  Yaoda Xu,et al.  Visual grouping in human parietal cortex , 2007, Proceedings of the National Academy of Sciences.

[3]  Simon Carlile,et al.  Contrasting monaural and interaural spectral cues for human sound localization. , 2004, The Journal of the Acoustical Society of America.

[4]  Gavriel Salvendy,et al.  Individualization of Head-Related Transfer Function for Three-Dimensional Virtual Auditory Display: A Review , 2007, HCI.

[5]  S. Yantis,et al.  Abrupt visual onsets and selective attention: evidence from visual search. , 1984, Journal of experimental psychology. Human perception and performance.

[6]  D. J. Felleman,et al.  Distributed hierarchical processing in the primate cerebral cortex. , 1991, Cerebral cortex.

[7]  F L Wightman,et al.  Localization using nonindividualized head-related transfer functions. , 1993, The Journal of the Acoustical Society of America.

[8]  D. Simons,et al.  Moving and looming stimuli capture attention , 2003, Perception & psychophysics.

[9]  John H. R. Maunsell,et al.  Visual response latencies of magnocellular and parvocellular LGN neurons in macaque monkeys , 1999, Visual Neuroscience.

[10]  D H Brainard,et al.  The Psychophysics Toolbox. , 1997, Spatial vision.

[11]  S. Yantis,et al.  Abrupt visual onsets and selective attention: Evidence from visual search. , 1984 .

[12]  O. Bagasra,et al.  Proceedings of the National Academy of Sciences , 1914, Science.

[13]  Alberto Prieto,et al.  VIS2SOUND on Reconfigurable Hardware , 2008, 2008 International Conference on Reconfigurable Computing and FPGAs.

[14]  B. Grothe,et al.  Precise inhibition is essential for microsecond interaural time difference coding , 2002, Nature.

[15]  Catherine Tallon-Baudry,et al.  Visual Grouping and the Focusing of Attention Induce Gamma-band Oscillations at Different Frequencies in Human Magnetoencephalogram Signals , 2006, Journal of Cognitive Neuroscience.

[16]  Tobi Delbruck,et al.  Evaluation of Event-Based Algorithms for Optical Flow with Ground-Truth from Inertial Measurement Sensor , 2016, Front. Neurosci..

[17]  Faustina Hwang,et al.  TESSA: Toolkit for Experimentation with Multimodal Sensory Substitution and Augmentation , 2015, CHI Extended Abstracts.

[18]  R. Douglas,et al.  Event-Based Neuromorphic Systems , 2015 .

[19]  Larry S. Davis,et al.  HRTF personalization using anthropometric measurements , 2003, 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684).

[20]  D. Tollin The Lateral Superior Olive: A Functional Role in Sound Source Localization , 2003, The Neuroscientist : a review journal bringing neurobiology, neurology and psychiatry.

[21]  Sven Mattisson,et al.  SeeHear System: A New Implementation , 1992, ESSCIRC '92: Eighteenth European Solid-State Circuits conference.

[22]  Tobi Delbrück,et al.  A 128$\times$ 128 120 dB 15 $\mu$s Latency Asynchronous Temporal Contrast Vision Sensor , 2008, IEEE Journal of Solid-State Circuits.

[23]  Tobi Delbrück,et al.  Integration of dynamic vision sensor with inertial measurement unit for electronically stabilized event-based vision , 2014, 2014 IEEE International Symposium on Circuits and Systems (ISCAS).