Reference Production as Search: The Impact of Domain Size on the Production of Distinguishing Descriptions.

When producing a description of a target referent in a visual context, speakers need to choose a set of properties that distinguish it from its distractors. Computational models of language production/generation usually model this as a search process and predict that the time taken will increase both with the number of distractors in a scene and with the number of properties required to distinguish the target. These predictions are reminiscent of classic findings in visual search; however, unlike models of reference production, visual search models also predict that search can become very efficient under certain conditions, something that reference production models do not consider. This paper investigates the predictions of these models empirically. In two experiments, we show that the time taken to plan a referring expression-as reflected by speech onset latencies-is influenced by distractor set size and by the number of properties required, but this crucially depends on the discriminability of the properties under consideration. We discuss the implications for current models of reference production and recent work on the role of salience in visual search.

[1]  Per B. Brockhoff,et al.  lmerTest Package: Tests in Linear Mixed Effects Models , 2017 .

[2]  M. Elsner,et al.  Giving Good Directions: Order of Mention Reflects Visual Salience , 2015, Front. Psychol..

[3]  H. Westerbeek,et al.  Stored object knowledge and the production of referring expressions: the case of color typicality , 2015, Front. Psychol..

[4]  M. Schlesewsky,et al.  Two routes to actorhood: lexicalized potency to act and identification of the actor role , 2015, Front. Psychol..

[5]  Vicente Ordonez,et al.  ReferItGame: Referring to Objects in Photographs of Natural Scenes , 2014, EMNLP.

[6]  Luke S. Zettlemoyer,et al.  See No Evil, Say No Evil: Description Generation from Densely Labeled Images , 2014, *SEMEVAL.

[7]  Alexander Koller,et al.  Generation of effective referring expressions in situated context , 2014 .

[8]  Jeremy M Wolfe,et al.  Guided search for triple conjunctions , 2014, Attention, perception & psychophysics.

[9]  Micha Elsner,et al.  Information Structure Prediction for Visual-world Referring Expressions , 2014, EACL.

[10]  Tamara L. Berg,et al.  BabyTalk: Understanding and Generating Simple Image Descriptions , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Frank Keller,et al.  Image Description using Visual Dependency Representations , 2013, EMNLP.

[12]  Kees van Deemter,et al.  Generating Expressions that Refer to Visible Objects , 2013, NAACL.

[13]  D. Barr,et al.  Random effects structure for confirmatory hypothesis testing: Keep it maximal. , 2013, Journal of memory and language.

[14]  Aykut Erdem,et al.  Visual saliency estimation by nonlinearly integrating features using region covariances. , 2013, Journal of vision.

[15]  Martijn Goudbeek,et al.  The Effect of Scene Variation on the Redundant Use of Color in Definite Reference , 2013, Cogn. Sci..

[16]  J. Theeuwes,et al.  Top-down versus bottom-up attentional control: a failed theoretical dichotomy , 2012, Trends in Cognitive Sciences.

[17]  Michael C. Frank,et al.  Predicting Pragmatic Reasoning in Language Games , 2012, Science.

[18]  Karl Stratos,et al.  Midge: Generating Image Descriptions From Computer Vision Detections , 2012, EACL.

[19]  Emiel Krahmer,et al.  Toward a Computational Psycholinguistics of Reference Production , 2012, Top. Cogn. Sci..

[20]  Emiel Krahmer,et al.  Computational Generation of Referring Expressions: A Survey , 2012, CL.

[21]  Emiel Krahmer,et al.  Factors causing overspecification in definite descriptions , 2011 .

[22]  Yiannis Aloimonos,et al.  Corpus-Guided Sentence Generation of Natural Images , 2011, EMNLP.

[23]  Robert G Alexander,et al.  Visual similarity effects in categorical search. , 2011, Journal of vision.

[24]  Emiel Krahmer,et al.  Non-deterministic attribute selection in reference production , 2011 .

[25]  Simon Farrell,et al.  Computational Modeling in Cognition: Principles and Practice , 2010 .

[26]  Ellen Campana,et al.  Natural discourse reference generation reduces cognitive load in spoken systems , 2010, Natural Language Engineering.

[27]  Cyrus Rashtchian,et al.  Every Picture Tells a Story: Generating Sentences from Images , 2010, ECCV.

[28]  Linda Jeffery,et al.  Race-specific norms for coding face identity and a functional role for norms. , 2010, Journal of vision.

[29]  Yansong Feng,et al.  How Many Words Is a Picture Worth? Automatic Caption Generation for News Images , 2010, ACL.

[30]  George L. Malcolm,et al.  The effects of target template specificity on visual search in real-world scenes: evidence from eye movements. , 2009, Journal of vision.

[31]  Ricardo Olmos,et al.  New algorithms assessing short summaries in expository texts using latent semantic analysis , 2009, Behavior research methods.

[32]  Sabine Süsstrunk,et al.  Frequency-tuned salient region detection , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  F. E. H. N. Wijermans The Cambridge Handbook of Computational Psychology , 2009 .

[34]  Victor A. F. Lamme,et al.  UvA-DARE (Digital Academic Repository) Brain responses strongly correlate with Weibull image statistics when processing natural images , 2009 .

[35]  Michael L. Mack,et al.  Viewing task influences eye movement control during active scene perception. , 2009, Journal of vision.

[36]  Eva Belke,et al.  Top-down effects of semantic knowledge in visual search are modulated by cognitive but not perceptual load , 2008, Perception & psychophysics.

[37]  F. Hamker,et al.  About the influence of post-saccadic mechanisms for visual stability on peri-saccadic compression of object location. , 2008, Journal of vision.

[38]  C. Koch,et al.  Task-demands can immediately reverse the effects of sensory-driven saliency in complex visual stimuli. , 2008, Journal of vision.

[39]  Athanassios Protopapas,et al.  Check Vocal: A program to facilitate checking the accuracy and response time of vocal responses from DMDX , 2007, Behavior research methods.

[40]  Dirk B. Walther,et al.  Modeling attention to salient proto-objects , 2006, Neural Networks.

[41]  Antonio Torralba,et al.  Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search. , 2006, Psychological review.

[42]  Michael J. Spivey,et al.  Inefficient conjunction search made efficient by concurrent spoken delivery of target identity , 2006, Perception & psychophysics.

[43]  John D. Kelleher,et al.  Incremental Generation of Spatial Referring Expressions in Situated Dialog , 2006, ACL.

[44]  Laura Stoia,et al.  Noun Phrase Generation for Situated Dialogs , 2006, INLG.

[45]  Eva Belke Visual determinants of preferred adjective order , 2006 .

[46]  Kees van Deemter Generating Referring Expressions that Involve Gradable Properties , 2006, CL.

[47]  M. Tanenhaus,et al.  Watching the eyes when talking about size: An investigation of message formulation and utterance planning , 2006 .

[48]  Karl G. D. Bailey,et al.  Do speakers and listeners observe the Gricean Maxim of Quantity , 2006 .

[49]  Josef van Genabith,et al.  Dynamically structuring, updating and interrelating representations of visual and linguistic discourse context , 2005, Artif. Intell..

[50]  Robert Dale,et al.  Viewing Referring Expression Generation as Search , 2005, IJCAI.

[51]  Pamela W. Jordan,et al.  Learning Content Selection Rules for Generating Object Descriptions in Dialogue , 2005, J. Artif. Intell. Res..

[52]  B. Gibson,et al.  Linguistically mediated visual search: The critical role of speech rate , 2005, Psychonomic bulletin & review.

[53]  Yuhong Jiang,et al.  Setting up the target template in visual search. , 2005, Journal of vision.

[54]  Matthew W. Crocker,et al.  The influence of the immediate visual context on incremental thematic role-assignment: evidence from eye-movements in depicted events , 2005, Cognition.

[55]  J. Wolfe,et al.  What attributes guide the deployment of visual attention and how do they do it? , 2004, Nature Reviews Neuroscience.

[56]  Astrid Busch,et al.  The Ebbinghaus illusion modulates visual search for size-defined targets: Evidence for preattentive processing of apparent object size , 2004, Perception & psychophysics.

[57]  Elizabeth S. Olds,et al.  Does Previewing One Stimulus Feature Help Conjunction Search? , 2004, Perception.

[58]  B. Rossion,et al.  Revisiting Snodgrass and Vanderwart's Object Pictorial Set: The Role of Surface Detail in Basic-Level Object Recognition , 2004, Perception.

[59]  Daniel Kersten,et al.  Is Color an Intrinsic Property of Object Representation? , 2003, Perception.

[60]  Kenneth I Forster,et al.  DMDX: A Windows display program with millisecond accuracy , 2003, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[61]  Kyle R Cave,et al.  Roles of salience and strategy in conjunction search. , 2002, Journal of experimental psychology. Human perception and performance.

[62]  M. Tanenhaus,et al.  Circumscribing Referential Domains during Real-Time Language Comprehension , 2002 .

[63]  A. Meyer,et al.  Tracking the time course of multidimensional stimulus discrimination: Analyses of viewing patterns and processing times during “same”-“different“ decisions , 2002 .

[64]  N. Haslam,et al.  Visual search: Efficiency continuum or distinct processes? , 2001, Psychonomic bulletin & review.

[65]  Michael J. Spivey,et al.  Linguistically Mediated Visual Search , 2001, Psychological science.

[66]  C. Koch,et al.  Computational modelling of visual attention , 2001, Nature Reviews Neuroscience.

[67]  Elizabeth S. Olds,et al.  Partial orientation pop-out helps difficult search for orientation , 2000, Perception & psychophysics.

[68]  Elizabeth S. Olds,et al.  Tracking visual search over space and time , 2000, Psychonomic bulletin & review.

[69]  Elizabeth S. Olds,et al.  The time-course of pop-out search , 2000, Vision Research.

[70]  G. Altmann,et al.  Incremental interpretation at verbs: restricting the domain of subsequent reference , 1999, Cognition.

[71]  R. Desimone Visual attention mediated by biased competition in extrastriate visual cortex. , 1998, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[72]  Ehud Reiter Building Natural-Language Generation Systems , 1996, ArXiv.

[73]  W. Cowan,et al.  Visual search for colour targets that are or are not linearly separable from distractors , 1996, Vision Research.

[74]  J. Palmer Attention in Visual Search: Distinguishing Four Causes of a Set-Size Effect , 1995 .

[75]  Jeremy M. Wolfe,et al.  Second-order parallel processing: visual search for the odd item in a subset. , 1995, Journal of experimental psychology. Human perception and performance.

[76]  Robert Dale,et al.  Computational Interpretations of the Gricean Maxims in the Generation of Referring Expressions , 1995, Cogn. Sci..

[77]  J. Palmer Set-size effects in visual search: The effect of attention is independent of the stimulus for simple tasks , 1994, Vision Research.

[78]  J. Wolfe,et al.  Guided Search 2.0 A revised model of visual search , 1994, Psychonomic bulletin & review.

[79]  G W Stuart,et al.  Preattentive Processing of Object Size: Implications for Theories of Size Perception , 1993, Perception.

[80]  G E Legge,et al.  Color improves object recognition in normal and low vision. , 1993, Journal of experimental psychology. Human perception and performance.

[81]  Zijiang J. He,et al.  Surfaces versus features in visual search , 1992, Nature.

[82]  Philipp Koehn,et al.  Cognitive Psychology , 1992, Ageing and Society.

[83]  Susan L. Franzel,et al.  Guided search: an alternative to the feature integration model for visual search. , 1989, Journal of experimental psychology. Human perception and performance.

[84]  J. Duncan,et al.  Visual search and stimulus similarity. , 1989, Psychological review.

[85]  Robert Dale,et al.  Cooking Up Referring Expressions , 1989, ACL.

[86]  Ken Nakayama,et al.  Serial and parallel processing of visual feature conjunctions , 1986, Nature.

[87]  Douglas E. Appelt,et al.  Planning English Referring Expressions , 1985, Artif. Intell..

[88]  T. Pechmann,et al.  Social interaction and the development of definite descriptions , 1982, Cognition.

[89]  J. G. Snodgrass,et al.  A standardized set of 260 pictures: norms for name agreement, image agreement, familiarity, and visual complexity. , 1980, Journal of experimental psychology. Human learning and memory.

[90]  A. Treisman,et al.  A feature-integration theory of attention , 1980, Cognitive Psychology.

[91]  P. Brockhoff,et al.  lmerTest: Tests for random and fixed effects for linear mixed effect models (lmer objects of lme4 package) , 2014 .

[92]  Kees van Deemter,et al.  Corpus-based metrics for assessing communal common ground , 2012, CogSci.

[93]  Michael J. Spivey,et al.  The Role of Preview and Incremental Delivery on Visual Search , 2012, CogSci.

[94]  Emiel Krahmer,et al.  The Impact of Colour Difference and Colour Codability on Reference Production , 2012, CogSci.

[95]  J. Wolfe,et al.  Visual search , 2008, Scholarpedia.

[96]  Jeremy M. Wolfe,et al.  Guided Search 4.0: Current Progress With a Model of Visual Search , 2007, Integrated Models of Cognitive Systems.

[97]  Michael L. Mack,et al.  VISUAL SALIENCY DOES NOT ACCOUNT FOR EYE MOVEMENTS DURING VISUAL SEARCH IN REAL-WORLD SCENES , 2007 .

[98]  Author ' s personal copy The role of context in object recognition , 2007 .

[99]  Laurent Itti,et al.  Models of Bottom-up Attention and Saliency , 2005 .

[100]  A. Arts,et al.  Overspecification in instructive texts , 2004 .

[101]  Julie C. Sedivy,et al.  Pragmatic Versus Form-Based Accounts of Referential Contrast: Evidence for Effects of Informativity Expectations , 2003, Journal of psycholinguistic research.

[102]  Willem J. M. Levelt,et al.  Producing spoken language: A blueprint of the speaker , 1999 .

[103]  K. Nakayama,et al.  Attention, pattern recognition and popout in visual search , 1998 .

[104]  K J Vicente,et al.  An ecological theory of expertise effects in memory recall. , 1998, Psychological review.

[105]  A Found,et al.  Parallel coding of conjunctions in visual search , 1998, Perception & psychophysics.

[106]  T. Landauer,et al.  An introduction to latent semantic analysis , 1998 .

[107]  Julie C. Sedivy,et al.  Subject Terms: Linguistics Language Eyes & eyesight Cognition & reasoning , 1995 .

[108]  R. Desimone,et al.  Neural mechanisms of selective visual attention. , 1995, Annual review of neuroscience.

[109]  T. Pechmann Incremental speech production and referential overspecification , 1989 .

[110]  W. Levelt Speaking: From Intention to Articulation , 1989 .

[111]  A Treisman,et al.  Feature analysis in early vision: evidence from search asymmetries. , 1988, Psychological review.

[112]  Hans-Joachim Novak,et al.  Strategies for Generating Coherent Descriptions of Object Movements in Street Scenes , 1987 .

[113]  Michael A. Arbib,et al.  From Schema Theory To Language , 1987 .

[114]  Werner Deutsch,et al.  Psychologie der Objektbenennung , 1976 .

[115]  Matthew H Tong,et al.  Please Scroll down for Article Visual Cognition Sun: Top-down Saliency Using Natural Statistics , 2022 .

[116]  S. McKee,et al.  journal homepage: www.elsevier.com/locate/visres , 2022 .

[117]  Zenzi M. Griffin,et al.  PSYCHOLOGICAL SCIENCE Research Article WHAT THE EYES SAY ABOUT SPEAKING , 2022 .

[118]  J. Wolfe,et al.  What Can 1 Million Trials Tell Us About Visual Search? , 1998 .