Paying Attention to Descriptions Generated by Image Captioning Models
暂无分享,去创建一个
Jorma Laaksonen | Ali Borji | H. R. Tavakoli | Hamed R. Tavakoli | Rakshith Shetty | A. Borji | Jorma T. Laaksonen | Rakshith Shetty
[1] Ali Borji,et al. Reconciling Saliency and Object Center-Bias Hypotheses in Explaining Free-Viewing Fixations , 2015, IEEE Transactions on Neural Networks and Learning Systems.
[2] Ali Borji,et al. Analysis of Scores, Datasets, and Models in Visual Saliency Prediction , 2013, 2013 IEEE International Conference on Computer Vision.
[3] W. Einhäuser,et al. Fixations on objects in natural scenes: dissociating importance from salience , 2013, Front. Psychol..
[4] Yifan Peng,et al. Exploring the role of gaze behavior and object detection in scene understanding , 2013, Front. Psychol..
[5] F. Pulvermüller,et al. Walking or Talking?: Behavioral and Neurophysiological Correlates of Action Verb Processing , 2001, Brain and Language.
[6] Geoffrey Zweig,et al. From captions to visual concepts and back , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[7] Jana Holsanova. How we focus attention in picture viewing, picture description, and during mental imagery , 2009 .
[8] Michael A. Arbib,et al. Action to Language via the Mirror Neuron System: Attention and the minimal subscene , 2006 .
[9] C. Lawrence Zitnick,et al. CIDEr: Consensus-based image description evaluation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[10] Alon Lavie,et al. Meteor Universal: Language Specific Translation Evaluation for Any Target Language , 2014, WMT@ACL.
[11] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[12] S. N. Sridhar,et al. Models of Sentence Production , 1988 .
[13] Mihai Surdeanu,et al. The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.
[14] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.
[15] Sanja Fidler,et al. The Role of Context for Object Detection and Semantic Segmentation in the Wild , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[16] Qi Zhao,et al. Learning saliency-based visual attention: A review , 2013, Signal Process..
[17] Jorma Laaksonen,et al. Video captioning with recurrent networks based on frame- and video-level features and visual content classification , 2015, ArXiv.
[18] Desmond Elliott,et al. Describing Images using Inferred Visual Dependency Representations , 2015, ACL.
[19] Frank Keller,et al. Image Description using Visual Dependency Representations , 2013, EMNLP.
[20] A. Meyer. The use of eye tracking in studies of sentence generation , 2004 .
[21] Jorma Laaksonen,et al. Saliency Revisited: Analysis of Mouse Movements Versus Fixations , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[22] Peter Young,et al. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions , 2014, TACL.
[23] Bernt Schiele,et al. The Long-Short Story of Movie Description , 2015, GCPR.
[24] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.
[25] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[26] Daniel Dominic Sleator,et al. Parsing English with a Link Grammar , 1995, IWPT.
[27] Karl Stratos,et al. Understanding and predicting importance in images , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.
[28] Chin-Yew Lin,et al. ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.
[29] Luc Van Gool,et al. The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.
[30] Xinlei Chen,et al. Microsoft COCO Captions: Data Collection and Evaluation Server , 2015, ArXiv.
[31] M. Elsner,et al. Giving Good Directions: Order of Mention Reflects Visual Salience , 2015, Front. Psychol..
[32] Cordelia Schmid,et al. Areas of Attention for Image Captioning , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).
[33] Nazli Ikizler-Cinbis,et al. Automatic Description Generation from Images: A Survey of Models, Datasets, and Evaluation Measures (Extended Abstract) , 2017, IJCAI.
[34] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[35] Kate Saenko,et al. Top-Down Visual Saliency Guided by Captions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[36] Armand Joulin,et al. Deep Fragment Embeddings for Bidirectional Image Sentence Mapping , 2014, NIPS.
[37] Zenzi M. Griffin,et al. Observing the what and when of language production for different age groups by monitoring speakers’ eye movements , 2006, Brain and Language.
[38] Vicente Ordonez,et al. Im2Text: Describing Images Using 1 Million Captioned Photographs , 2011, NIPS.
[39] D. E. Irwin,et al. Minding the clock , 2003 .
[40] Margaret Mitchell,et al. VQA: Visual Question Answering , 2015, International Journal of Computer Vision.
[41] Peter Young,et al. Framing Image Description as a Ranking Task: Data, Models and Evaluation Metrics , 2013, J. Artif. Intell. Res..
[42] Jorma Laaksonen,et al. Exploiting inter-image similarity and ensemble of extreme learners for fixation prediction using deep features , 2016, Neurocomputing.
[43] Asha Iyer,et al. Components of bottom-up gaze allocation in natural images , 2005, Vision Research.
[44] Yifan Peng,et al. Studying Relationships between Human Gaze, Description, and Computer Vision , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.
[45] Zenzi M. Griffin,et al. PSYCHOLOGICAL SCIENCE Research Article WHAT THE EYES SAY ABOUT SPEAKING , 2022 .
[46] Subhransu Maji,et al. Deep filter banks for texture recognition and segmentation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[47] Zenzi M. Griffin,et al. Why Look? Reasons for Eye Movements Related to Language Production. , 2004 .
[48] M. Arbib. Action to language via the mirror neuron system , 2006 .
[49] J. Charles. Cognition and Sentence Production: A Cross-Linguistic Study. , 1989 .
[50] Mario Fritz,et al. Ask Your Neurons: A Neural-Based Approach to Answering Questions about Images , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[51] Cyrus Rashtchian,et al. Collecting Image Annotations Using Amazon’s Mechanical Turk , 2010, Mturk@HLT-NAACL.
[52] Michelle R. Greene. Statistics of high-level scene context , 2013, Front. Psychol..
[53] Yusuke Sugano,et al. Seeing with Humans: Gaze-Assisted Neural Image Captioning , 2016, ArXiv.
[54] Qi Zhao,et al. SALICON: Saliency in Context , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[55] Frédo Durand,et al. Learning to predict where humans look , 2009, 2009 IEEE 12th International Conference on Computer Vision.
[56] Samy Bengio,et al. Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[57] Jorma Laaksonen,et al. Exploiting Scene Context for Image Captioning , 2016, iV&L-MM@MM.
[58] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[59] Atsuto Maki,et al. From generic to specific deep representations for visual recognition , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[60] Cristian Sminchisescu,et al. Actions in the Eye: Dynamic Gaze Datasets and Learnt Saliency Models for Visual Recognition , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[61] Cyrus Rashtchian,et al. Every Picture Tells a Story: Generating Sentences from Images , 2010, ECCV.
[62] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[63] Trevor Darrell,et al. Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[64] Geoffrey Zweig,et al. Language Models for Image Captioning: The Quirks and What Works , 2015, ACL.
[65] Chee Kheong Siew,et al. Extreme learning machine: Theory and applications , 2006, Neurocomputing.
[66] Pietro Perona,et al. Graph-Based Visual Saliency , 2006, NIPS.
[67] W. Levelt,et al. Viewing and naming objects: eye movements during noun phrase production , 1998, Cognition.
[68] C. Lawrence Zitnick,et al. Adopting Abstract Images for Semantic Scene Understanding , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[69] Yejin Choi,et al. Composing Simple Image Descriptions using Web-scale N-grams , 2011, CoNLL.