Every Object Tells a Story

Most work within the computational event modeling community has tended to focus on the interpretation and ordering of events that are associated with verbs and event nominals in linguistic expressions. What is often overlooked in the construction of a global interpretation of a narrative is the role contributed by the objects participating in these structures, and the latent events and activities conventionally associated with them. Recently, the analysis of visual images has also enriched the scope of how events can be identified, by anchoring both linguistic expressions and ontological labels to segments, subregions, and properties of images. By semantically grounding event descriptions in their visualization, the importance of object-based attributes becomes more apparent. In this position paper, we look at the narrative structure of objects: that is, how objects reference events through their intrinsic attributes, such as affordances, purposes, and functions. We argue that, not only do objects encode conventionalized events, but that when they are composed within specific habitats, the ensemble can be viewed as modeling coherent event sequences, thereby enriching the global interpretation of the evolving narrative being constructed.

[1]  Rada Mihalcea,et al.  Mining semantic affordances of visual object categories , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Mark Steedman,et al.  Plans, Affordances, And Combinatory Grammar , 2002 .

[3]  James Pustejovsky,et al.  Where Things Happen: On the Semantics of Event Localization , 2013 .

[4]  Chitta Baral,et al.  From Images to Sentences through Scene Description Graphs using Commonsense Reasoning and Knowledge , 2015, ArXiv.

[5]  Alberto Del Bimbo,et al.  Event detection and recognition for semantic annotation of video , 2010, Multimedia Tools and Applications.

[6]  Svetlana Lazebnik,et al.  Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[7]  Michael S. Bernstein,et al.  Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations , 2016, International Journal of Computer Vision.

[8]  J. Pustejovsky Dynamic Event Structure and Habitat Theory , 2013 .

[9]  三嶋 博之 The theory of affordances , 2008 .

[10]  François Osiurak,et al.  What is an affordance? 40 years later , 2017, Neuroscience & Biobehavioral Reviews.

[11]  James Pustejovsky,et al.  SemEval-2013 Task 1: TempEval-3: Evaluating Time Expressions, Events, and Temporal Relations , 2013, *SEMEVAL.

[12]  Benjamin Van Durme,et al.  A Comparison of the Events and Relations Across ACE, ERE, TAC-KBP, and FrameNet Annotation Standards , 2014, EVENTS@ACL.

[13]  Nathanael Chambers,et al.  Unsupervised Learning of Narrative Schemas and their Participants , 2009, ACL.

[14]  Vincent Claveau,et al.  Automatic Acquisition of GL Resources, Using an Explanatory, Symbolic Technique , 2013, Advances in Generative Lexicon Theory.

[15]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[16]  James Pustejovsky,et al.  Multimodal Semantic Simulations of Linguistically Underspecified Motion Events , 2016, Spatial Cognition.

[17]  Jitendra Malik,et al.  Visual Semantic Role Labeling , 2015, ArXiv.

[18]  James Pustejovsky,et al.  The Qualitative Spatial Dynamics of Motion in Language , 2011, Spatial Cogn. Comput..

[19]  Antske Fokkens,et al.  Building event-centric knowledge graphs from news , 2016, J. Web Semant..

[20]  James Pustejovsky,et al.  The Generative Lexicon , 1995, CL.

[21]  Beth Levin,et al.  Reflections on Manner/Result Complementarity* , 2010 .

[22]  Peter Young,et al.  From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions , 2014, TACL.

[23]  Alexander N. Poddiakov Exploratory and Counter-Exploratory Objects: Design of Meta-Affordances , 2018 .

[24]  James Pustejovsky,et al.  TimeML: Robust Specification of Event and Temporal Expressions in Text , 2003, New Directions in Question Answering.

[25]  James Pustejovsky,et al.  VoxML: A Visualization Modeling Language , 2016, LREC.

[26]  R. Shaw,et al.  Perceiving, Acting and Knowing : Toward an Ecological Psychology , 1978 .

[27]  Philipp Cimiano,et al.  Automatic Acquisition of Ranked Qualia Structures from the Web , 2007, ACL.

[28]  Ali Farhadi,et al.  Situation Recognition: Visual Semantic Role Labeling for Image Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Basura Fernando,et al.  SPICE: Semantic Propositional Image Caption Evaluation , 2016, ECCV.