Computational framework for fusing eye movements and spoken narratives for image annotation
暂无分享,去创建一个
Cecilia Ovesdotter Alm | Jeff B Pelz | Emily Prud'hommeaux | Preethi Vaidyanathan | Cecilia O Alm | J. Pelz | Preethi Vaidyanathan | Emily Prudhommeaux
[1] Xin Wang,et al. Role of domain knowledge in developing user-centered medical-image indexing , 2012, J. Assoc. Inf. Sci. Technol..
[2] Moreno I. Coco,et al. Scan Patterns Predict Sentence Production in the Cross-Modal Processing of Visual Scenes , 2012, Cogn. Sci..
[3] A Pollatsek,et al. The use of information below fixation in reading and in visual search. , 1993, Canadian journal of experimental psychology = Revue canadienne de psychologie experimentale.
[4] Md. Monirul Islam,et al. A review on automatic image annotation techniques , 2012, Pattern Recognit..
[5] Preethi Vaidyanathan,et al. Visual-Linguistic Semantic Alignment: Fusing Human Gaze and Spoken Narratives for Image Region Annotation , 2017 .
[6] Julie C. Sedivy,et al. Eye movements and spoken language comprehension: Effects of visual context on syntactic ambiguity resolution , 2002, Cognitive Psychology.
[7] Zenzi M. Griffin,et al. Why Look? Reasons for Eye Movements Related to Language Production. , 2004 .
[8] Alex Pentland,et al. Learning words from sights and sounds: a computational model , 2002, Cogn. Sci..
[9] Cecilia Ovesdotter Alm,et al. Using Co-Captured Face, Gaze, and Verbal Reactions to Images of Varying Emotional Content for Analysis and Semantic Alignment , 2017, AAAI Workshops.
[10] Yifan Peng,et al. Studying Relationships between Human Gaze, Description, and Computer Vision , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.
[11] David A. Forsyth,et al. Matching Words and Pictures , 2003, J. Mach. Learn. Res..
[12] Jeff B. Pelz,et al. Visualinguistic Approach to Medical Image Understanding , 2012, AMIA.
[13] M. Just,et al. Eye fixations and cognitive processes , 1976, Cognitive Psychology.
[14] Joyce Yue Chai,et al. Incorporating Temporal and Semantic Information with Eye Gaze for Automatic Word Acquisition in Multimodal Conversational Systems , 2008, EMNLP.
[15] J. Shanteau. How much information does an expert use? Is it relevant? , 1992 .
[16] Zenzi M. Griffin,et al. PSYCHOLOGICAL SCIENCE Research Article WHAT THE EYES SAY ABOUT SPEAKING , 2022 .
[17] Julie C. Sedivy,et al. Subject Terms: Linguistics Language Eyes & eyesight Cognition & reasoning , 1995 .
[18] Nathan Schneider,et al. Association for Computational Linguistics: Human Language Technologies , 2011 .
[19] Dumitru Erhan,et al. Show and Tell: Lessons Learned from the 2015 MSCOCO Image Captioning Challenge , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[20] Luc Van Gool,et al. Object Referring in Videos with Language and Human Gaze , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[21] Luc Van Gool,et al. Object Referring in Visual Scene with Spoken Language , 2017, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).
[22] Jordi Pont-Tuset,et al. Convolutional Oriented Boundaries: From Image Segmentation to High-Level Tasks , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[23] M. Tanenhaus,et al. Introduction to the special issue on language–vision interactions , 2007 .
[24] Pertti Vakkari,et al. Subject Knowledge, Source of Terms, and Term Selection in Query Expansion: An Analytical Study , 2002, ECIR.
[25] Jeff B. Pelz,et al. Fusing eye movements and observer narratives for expert-driven image-region annotations , 2016, ETRA.
[26] Douglas DeCarlo,et al. Robust clustering of eye movement recordings for quantification of visual interest , 2004, ETRA.
[27] Jianfei Cai,et al. Beyond pixels: A comprehensive survey from bottom-up to semantic image segmentation and cosegmentation , 2015, J. Vis. Commun. Image Represent..
[28] K. Rayner,et al. Making and correcting errors during sentence comprehension: Eye movements in the analysis of structurally ambiguous sentences , 1982, Cognitive Psychology.
[29] S. P. Lloyd,et al. Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.
[30] Norman I. Badler,et al. Temporal scene analysis: conceptual descriptions of object movements. , 1975 .
[31] C. Lawrence Zitnick,et al. Adopting Abstract Images for Semantic Scene Understanding , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[32] Ali Farhadi,et al. Situation Recognition: Visual Semantic Role Labeling for Image Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[33] F. Quimby. What's in a picture? , 1993, Laboratory animal science.
[34] Dan Klein,et al. Improved Inference for Unlexicalized Parsing , 2007, NAACL.
[35] Jitendra Malik,et al. Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
[36] M. Tanenhaus,et al. Time Course of Frequency Effects in Spoken-Word Recognition: Evidence from Eye Movements , 2001, Cognitive Psychology.
[37] Robert L. Mercer,et al. The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.
[38] Z. Griffin. Why Look? Reasons for Eye Movements Related to Language Production. , 2004 .
[39] W. Levelt,et al. Viewing and naming objects: eye movements during noun phrase production , 1998, Cognition.
[40] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[41] Emiel Krahmer,et al. DIDEC: The Dutch Image Description and Eye-tracking Corpus , 2018, COLING.
[42] Paul Boersma,et al. Praat, a system for doing phonetics by computer , 2002 .
[43] Moreno I. Coco,et al. The impact of attentional, linguistic, and visual features during object naming , 2013, Front. Psychol..
[44] Cecilia Ovesdotter Alm,et al. Multimodal Alignment for Affective Content , 2018, AAAI Workshops.
[45] Jana Holsanova,et al. The Dynamics of Picture Viewing and Picture Description , 2006 .
[46] Cecilia Ovesdotter Alm,et al. Object Categorization: Words and Pictures: Categories, Modifiers, Depiction, and Iconography , 2009 .
[47] Gerd Herzog,et al. VIsual TRAnslator: Linking perceptions and natural language descriptions , 1994, Artificial Intelligence Review.
[48] Stephen M. Fiore,et al. Perceptual (Re)learning: A Leverage Point for Human-Centered Computing , 2007, IEEE Intelligent Systems.
[49] Li Fei-Fei,et al. Towards total scene understanding: Classification, annotation and segmentation in an automatic framework , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.
[50] A. Murat Tekalp,et al. Automatic Image Annotation Using Adaptive Color Classification , 1996, CVGIP Graph. Model. Image Process..
[51] Andrew Zisserman,et al. OBJCUT: Efficient Segmentation Using Top-Down and Bottom-Up Cues , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[52] Moreno I. Coco,et al. Sentence Production in Naturalistic Scenes with Referential Ambiguity , 2010 .
[53] Kate Saenko,et al. Integrating Language and Vision to Generate Natural Language Descriptions of Videos in the Wild , 2014, COLING.
[54] Pietro Perona,et al. A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).
[55] Rohini K. Srihari,et al. Automatic Indexing and Content-Based Retrieval of Captioned Images , 1995, Computer.
[56] Antoine Geissbühler,et al. A Review of Content{Based Image Retrieval Systems in Medical Applications { Clinical Bene(cid:12)ts and Future Directions , 2022 .
[57] K. Rayner. Eye movements in reading and information processing: 20 years of research. , 1998, Psychological bulletin.
[58] Roger M. Cooper,et al. The control of eye fixation by the meaning of spoken language: A new methodology for the real-time investigation of speech perception, memory, and language processing. , 1974 .
[59] Hermann Ney,et al. Improved Alignment Models for Statistical Machine Translation , 1999, EMNLP.
[60] Yee Whye Teh,et al. Names and faces in the news , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..
[61] Karen Holtzblatt,et al. Contextual design , 1997, INTR.
[62] E. Krupinski,et al. The importance of perception research in medical imaging. , 2000, Radiation medicine.
[63] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[64] Yejin Choi,et al. Generalizing Image Captions for Image-Text Parallel Corpus , 2013, ACL.
[65] Sanja Fidler,et al. What Are You Talking About? Text-to-Image Coreference , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[66] Paul R. Smart,et al. Knowledge Elicitation: Methods, Tools and Techniques , 2015 .
[67] Daniel C. Richardson,et al. Looking To Understand: The Coupling Between Speakers' and Listeners' Eye Movements and Its Relationship to Discourse Comprehension , 2005, Cogn. Sci..
[68] Peter J Pronovost,et al. Identifying and categorising patient safety hazards in cardiovascular operating rooms using an interdisciplinary approach: a multisite study , 2012, BMJ quality & safety.
[69] A. Treisman,et al. A feature-integration theory of attention , 1980, Cognitive Psychology.
[70] Jiebo Luo,et al. Unsupervised Alignment of Natural Language Instructions with Video Segments , 2014, AAAI.
[71] Paul A. Viola,et al. Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.
[72] Jeff B. Pelz,et al. SNAG: Spoken Narratives and Gaze Dataset , 2018, ACL.
[73] Marcel Worring,et al. Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..
[74] Eli Saber,et al. Probabilistic approach for extracting regions of interest in digital images , 2010, J. Electronic Imaging.
[75] J. Trueswell,et al. Interpreting pronouns and demonstratives in Finnish: Evidence for a form-specific approach to reference resolution , 2008 .
[76] Carla E. Brodley,et al. ASSERT: A Physician-in-the-Loop Content-Based Retrieval System for HRCT Image Databases , 1999, Comput. Vis. Image Underst..
[77] Femke F. van der Meulen. Coordination of eye gaze and speech in sentence production , 2003 .
[78] James Ze Wang,et al. Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach , 2003, IEEE Trans. Pattern Anal. Mach. Intell..
[79] Elizabeth A. Krupinski,et al. Research and applications: Investigating the link between radiologists' gaze, diagnostic decision, and image content , 2013, J. Am. Medical Informatics Assoc..
[80] Wenji Mao,et al. Social Computing: From Social Informatics to Social Intelligence , 2007, IEEE Intell. Syst..
[81] Michael Gygli,et al. Fast Object Class Labelling via Speech , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[82] Chen Yu,et al. On the Integration of Grounding Language and Learning Objects , 2004, AAAI.
[83] David L. Waltz. Generating and Understanding Scene Descriptions. , 1980 .
[84] Reynold Bailey,et al. Fusing Dialogue and Gaze From Discussions of 2D and 3D Scenes , 2019, ICMI.
[85] Jason Dykes,et al. Human-Centered Approaches in Geovisualization Design: Investigating Multiple Methods Through a Long-Term Case Study , 2011, IEEE Transactions on Visualization and Computer Graphics.
[86] Mark Q. Shaw,et al. Automatic Image Segmentation by Dynamic Region Growth and Multiresolution Merging , 2009, IEEE Transactions on Image Processing.
[87] Jorma Laaksonen,et al. Paying Attention to Descriptions Generated by Image Captioning Models , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[88] Stephen Clark,et al. Improving Multi-Modal Representations Using Image Dispersion: Why Less is Sometimes More , 2014, ACL.
[89] Tara N. Sainath,et al. State-of-the-Art Speech Recognition with Sequence-to-Sequence Models , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[90] Jeff B. Pelz,et al. Computational Integration of Human Vision and Natural Language through Bitext Alignment , 2015, VL@EMNLP.
[91] Deb Roy,et al. Integration of speech and vision using mutual information , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).
[92] Philip Heng Wai Leong,et al. Adapting content-based image retrieval techniques for the semantic annotation of medical images , 2016, Comput. Medical Imaging Graph..
[93] Zeshu Shao,et al. Predicting Naming Latencies for Action Pictures: Dutch Norms , 2022 .
[94] Hermann Ney,et al. A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.
[95] Chen Yu,et al. A multimodal learning interface for grounding spoken language in sensory perceptions , 2003, ICMI '03.
[96] D. Scott. Perceptual learning. , 1974, Queen's nursing journal.
[97] Lei Zhang,et al. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[98] Gregory J. Zelinsky,et al. Specifying the relationships between objects, gaze, and descriptions for scene understanding , 2013 .
[99] David A. Forsyth,et al. Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.
[100] Lamberto Ballan,et al. Love Thy Neighbors: Image Annotation by Exploiting Image Metadata , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[101] Ben Taskar,et al. Alignment by Agreement , 2006, NAACL.
[102] M A Just,et al. A theory of reading: from eye fixations to comprehension. , 1980, Psychological review.