论文信息 - Audio-visual saliency map: Overview, basic models and hardware implementation

Audio-visual saliency map: Overview, basic models and hardware implementation

In this paper we provide an overview of audiovisual saliency map models. In the simplest model, the location of auditory source is modeled as a Gaussian and use different methods of combining the auditory and visual information. We then provide experimental results with applications of simple audio-visual integration models for cognitive scene analysis. We validate the simple audio-visual saliency models with a hardware convolutional network architecture and real data recorded from moving audio-visual objects. The latter system was developed under Torch language by extending the attention.lua (code) and attention.ui (GUI) files that implement Culurciello's visual attention model.

[1] Richard F. Lyon,et al. A computational model of filtering, detection, and compression in the cochlea , 1982, ICASSP.

[2] Alexandre Bernardino,et al. Multimodal saliency-based bottom-up attention a framework for the humanoid robot iCub , 2008, 2008 IEEE International Conference on Robotics and Automation.

[3] Christof Koch,et al. A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[4] Rainer Stiefelhagen,et al. Multimodal saliency-based attention for object-based scene analysis , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[5] T. Stanford,et al. Multisensory integration: current issues from the perspective of the single neuron , 2008, Nature Reviews Neuroscience.

[6] A. Murat Tekalp,et al. Multimodal speaker/speech recognition using lip motion, lip texture and audio , 2006, Signal Process..

[7] P. König,et al. Audio-visual integration during overt visual attention , 2008 .

[8] S. Hillyard,et al. Involuntary orienting to sound improves visual perception , 2000, Nature.

[9] F. Harris. On the use of windows for harmonic analysis with the discrete Fourier transform , 1978, Proceedings of the IEEE.

[10] Peter König,et al. Integrating audiovisual information for the control of overt attention. , 2007, Journal of vision.

[11] B. Stein,et al. Determinants of multisensory integration in superior colliculus neurons. I. Temporal factors , 1987, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[12] Sadaoki Furui,et al. TOWARD ROBUST MULTIMODAL SPEECH RECOGNITION , 2005 .

[13] C. Spence,et al. The Handbook of Multisensory Processing , 2004 .

[14] S. Grossberg,et al. A Neural Model of Multimodal Adaptive Saccadic Eye Movement Control by Superior Colliculus , 1997, The Journal of Neuroscience.

[15] Stefan Mihalas,et al. A model of proto-object based saliency , 2014, Vision Research.

[16] P. Mamassian,et al. Multisensory processing in review: from physiology to behaviour. , 2010, Seeing and perceiving.

[17] Ryan A. Stevenson,et al. Audiovisual integration in human superior temporal sulcus: Inverse effectiveness and the neural processing of speech and object recognition , 2009, NeuroImage.

[18] Jinglong Wu,et al. Task-irrelevant auditory stimuli affect audiovisual integration in a visual attention task: Evidence from event-related potentials , 2011, The 2011 IEEE/ICME International Conference on Complex Medical Engineering.

[19] schauerte kuehn,et al. A Modular Audio-Visual Scene Analysis and Attention System for Humanoid Robots , 2012 .

[20] Trevor Darrell,et al. Audiovisual arrays for untethered spoken interfaces , 2002, Proceedings. Fourth IEEE International Conference on Multimodal Interfaces.

[21] Ryan A. Stevenson,et al. Superadditive BOLD activation in superior temporal sulcus with threshold non-speech objects , 2007, Experimental Brain Research.

[22] B. Stein,et al. Visual, auditory, and somatosensory convergence on cells in superior colliculus results in multisensory integration. , 1986, Journal of neurophysiology.

[23] B. Stein,et al. Spatial determinants of multisensory integration in cat superior colliculus neurons. , 1996, Journal of neurophysiology.