Using Visual Information to Predict Lexical Preference

Most NLP systems make predictions based solely on linguistic (textual or spoken) input. We show how to use visual information to make better linguistic predictions. We focus on selectional preference; specifically, determining the plausible noun arguments for particular verb predicates. For each argument noun, we extract visual features from corresponding images on the web. For each verb predicate, we train a classifier to select the visual features that are indicative of its preferred arguments. We show that for certain verbs, using visual information can significantly improve performance over a baseline. For the successful cases, visual information is useful even in the presence of cooccurrence information derived from webscale text. We assess a variety of training configurations, which vary over classes of visual features, methods of image acquisition, and numbers of images.

[1]  Katrin Erk,et al.  A Simple, Similarity-based Model for Selectional Preferences , 2007, ACL.

[2]  Philip Resnik,et al.  Selectional Preference and Sense Disambiguation , 1997 .

[3]  Frank Keller,et al.  Using the Web to Obtain Frequencies for Unseen Bigrams , 2003, CL.

[4]  M. Veloso,et al.  Latent Variable Models , 2019, Statistical and Econometric Methods for Transportation Data Analysis.

[5]  Mats Rooth,et al.  Structural Ambiguity and Lexical Relations , 1991, ACL.

[6]  Oren Etzioni,et al.  A Latent Dirichlet Allocation Method for Selectional Preferences , 2010, ACL.

[7]  Stephen Clark,et al.  Class-Based Probability Estimation Using a Semantic Hierarchy , 2002, CL.

[8]  Hermann Ney,et al.  Features for image retrieval: an experimental comparison , 2008, Information Retrieval.

[9]  Diarmuid Ó Séaghdha Latent Variable Models of Selectional Preference , 2010, ACL.

[10]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[11]  Frank Keller,et al.  Combining Syntax and Thematic Fit in a Probabilistic Model of Sentence Processing , 2006 .

[12]  Alon Itai,et al.  Automatic Processing of Large Corpora for the Resolution of Anaphora References , 1990, COLING.

[13]  Yansong Feng,et al.  Visual Information in Semantic Representation , 2010, NAACL.

[14]  Mats Rooth,et al.  Inducing a Semantically Annotated Lexicon via EM-Based Clustering , 1999, ACL.

[15]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[16]  Benjamin Van Durme,et al.  Learning Bilingual Lexicons Using the Visual Similarity of Labeled Web Images , 2011, IJCAI.

[17]  Nathanael Chambers,et al.  Improving the Use of Pseudo-Words for Evaluating Selectional Preferences , 2010, ACL.

[18]  Heng Ji,et al.  New Tools for Web-Scale N-grams , 2010, LREC.

[19]  A. Torralba,et al.  The role of context in object recognition , 2007, Trends in Cognitive Sciences.

[20]  Yansong Feng,et al.  Topic Models for Image Annotation and Text Illustration , 2010, HLT-NAACL.

[21]  P. Resnik Selectional constraints: an information-theoretic model and its computational realization , 1996, Cognition.

[22]  Randy Goebel,et al.  Discriminative Learning of Selectional Preference from Unlabeled Text , 2008, EMNLP.

[23]  Patrick Haffner,et al.  Support vector machines for histogram-based image classification , 1999, IEEE Trans. Neural Networks.

[24]  Ido Dagan,et al.  Similarity-Based Models of Word Cooccurrence Probabilities , 1998, Machine Learning.

[25]  T. Urdan,et al.  The Role of Context , 1999 .

[26]  Pietro Perona,et al.  Learning object categories from Google's image search , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.