ViNet: Pushing the limits of Visual Modality for Audio-Visual Saliency Prediction
暂无分享,去创建一个
Vineet Gandhi | Shreyank Jyoti | Pradeep Yarlagadda | Shyamgopal Karthik | Ramanathan Subramanian | Samyak Jain | Subramanian Ramanathan | Vineet Gandhi | Shyamgopal Karthik | Shreyank Jyoti | Samyak Jain | P. Yarlagadda
[1] Jorge Dias,et al. Attentional Mechanisms for Socially Interactive Robots–A Survey , 2014, IEEE Transactions on Autonomous Mental Development.
[2] Santanu Chaudhury,et al. Visual saliency guided video compression algorithm , 2013, Signal Process. Image Commun..
[3] P. Maragos,et al. STAViS: Spatio-Temporal AudioVisual Saliency Network , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[4] Petros Maragos,et al. SUSiNet: See, Understand and Summarize It , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[5] Mubarak Shah,et al. Action MACH a spatio-temporal Maximum Average Correlation Height filter for action recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.
[6] A. Coutrot,et al. How saliency, faces, and sound influence gaze in dynamic social scenes. , 2014, Journal of vision.
[7] Kyle Min,et al. TASED-Net: Temporally-Aggregating Spatial Encoder-Decoder Network for Video Saliency Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[8] Aykut Erdem,et al. Spatio-Temporal Saliency Networks for Dynamic Saliency Prediction , 2016, IEEE Transactions on Multimedia.
[9] Xiongkuo Min,et al. A Multimodal Saliency Model for Videos With High Audio-Visual Correspondence , 2020, IEEE Transactions on Image Processing.
[10] Thomas Brox,et al. U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.
[11] Hugo Larochelle,et al. Recurrent Mixture Density Network for Spatiotemporal Visual Attention , 2016, ICLR.
[12] Qingshan Liu,et al. Video Saliency Prediction Using Enhanced Spatiotemporal Alignment Network , 2020, Pattern Recognit..
[13] Antoine Coutrot,et al. Influence of soundtrack on eye movements during video exploration , 2012 .
[14] Sasa Bodiroza,et al. Evaluating the Effect of Saliency Detection and Attention Manipulation in Human-Robot Interaction , 2013, Int. J. Soc. Robotics.
[15] John M. Henderson,et al. Clustering of Gaze During Dynamic Scene Viewing is Predicted by Motion , 2011, Cognitive Computation.
[16] Rainer Stiefelhagen,et al. Multimodal saliency-based attention for object-based scene analysis , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[17] Shanmuganathan Raman,et al. Facial Expression Recognition Using Visual Saliency and Deep Learning , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).
[18] Jan Theeuwes,et al. Pip and pop: nonspatial auditory signals improve spatial visual search. , 2008, Journal of experimental psychology. Human perception and performance.
[19] E. Van der Burg,et al. Audiovisual events capture attention: evidence from temporal order judgments. , 2008, Journal of vision.
[20] Garrison W. Cottrell,et al. Visual saliency model for robot cameras , 2008, 2008 IEEE International Conference on Robotics and Automation.
[21] H. McGurk,et al. Hearing lips and seeing voices , 1976, Nature.
[22] Federica Proietto Salanitri,et al. Video Saliency Detection with Domain Adaptation using Hierarchical Gradient Reversal Layers , 2020, ArXiv.
[23] Ali Borji,et al. Revisiting Video Saliency: A Large-Scale Benchmark and a New Model , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[24] Esa Rahtu,et al. DAVE: A Deep Audio-Visual Embedding for Dynamic Saliency Prediction , 2019 .
[25] Song Wang,et al. SalSAC: A Video Saliency Prediction Model with Shuffled Attentions and Correlation-Based ConvLSTM , 2020, AAAI.
[26] Nanning Zheng,et al. Visual Saliency Based Object Tracking , 2009, ACCV.
[27] Vineet Gandhi,et al. GAZED Gaze-guided Cinematic Editing of Wide-Angle Monocular Video Recordings , 2020, CHI.
[28] Chen Sun,et al. Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification , 2017, ECCV.
[29] Andrew Zisserman,et al. Look, Listen and Learn , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[30] Alexandre Bernardino,et al. Multimodal saliency-based bottom-up attention a framework for the humanoid robot iCub , 2008, 2008 IEEE International Conference on Robotics and Automation.
[31] Frédo Durand,et al. What Do Different Evaluation Metrics Tell Us About Saliency Models? , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[32] Qi Zhao,et al. SALICON: Saliency in Context , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[33] Chenliang Xu,et al. Audio-Visual Event Localization in Unconstrained Videos , 2018, ECCV.
[34] Ivan V. Bajic,et al. Saliency-Aware Video Compression , 2014, IEEE Transactions on Image Processing.
[35] Noel E. O'Connor,et al. Simple vs complex temporal recurrences for video saliency prediction , 2019, BMVC.
[36] Mohan S. Kankanhalli,et al. Static saliency vs. dynamic saliency: a comparative study , 2013, ACM Multimedia.
[37] Antoine Coutrot,et al. Multimodal Saliency Models for Videos , 2016 .
[38] Mohan S. Kankanhalli,et al. Audio Matters in Visual Attention , 2014, IEEE Transactions on Circuits and Systems for Video Technology.
[39] Xiongkuo Min,et al. Fixation prediction through multimodal analysis , 2015, 2015 Visual Communications and Image Processing (VCIP).
[40] Andrew Owens,et al. Audio-Visual Scene Analysis with Self-Supervised Multisensory Features , 2018, ECCV.
[41] Guillermo Sapiro,et al. SalGaze: Personalizing Gaze Estimation using Visual Saliency , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).
[42] C. Schmid,et al. Actions in context , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.
[43] Hanqiu Sun,et al. Video Saliency Prediction Using Spatiotemporal Residual Attentive Networks , 2020, IEEE Transactions on Image Processing.
[44] Antonio Torralba,et al. SoundNet: Learning Sound Representations from Unlabeled Video , 2016, NIPS.
[45] C. Spence,et al. Crossmodal binding: Evaluating the “unity assumption” using audiovisual speech stimuli , 2007, Perception & psychophysics.
[46] Tae-Hyun Oh,et al. Learning to Localize Sound Source in Visual Scenes , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[47] Petros Maragos,et al. A perceptually based spatio-temporal computational framework for visual saliency estimation , 2015, Signal Process. Image Commun..
[48] Andrew Zisserman,et al. Objects that Sound , 2017, ECCV.
[49] Luc Van Gool,et al. Creating Summaries from User Videos , 2014, ECCV.
[50] Wenguan Wang,et al. Deep Visual Attention Prediction , 2017, IEEE Transactions on Image Processing.
[51] Haibin Ling,et al. Revisiting Video Saliency Prediction in the Deep Learning Era , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[52] Zulin Wang,et al. Predicting Video Saliency with Object-to-Motion CNN and Two-layer Convolutional LSTM , 2017, ECCV.
[53] Vineet Gandhi,et al. Tidying Deep Saliency Prediction Architectures , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).