Semi-supervised methods for expanding psycholinguistics norms by integrating distributional similarity with the structure of WordNet

In this work, we present two complementary methods for the expansion of psycholinguistics norms. The first method is a random-traversal spreading activation approach which transfers existing norms onto semantically related terms using notions of synonymy, hypernymy, and pertainymy to approach full coverage of the English language. The second method makes use of recent advances in distributional similarity representation to transfer existing norms to their closest neighbors in a high-dimensional vector space. These two methods (along with a naive hybrid approach combining the two) have been shown to significantly outperform a state-of-the-art resource expansion system at our pilot task of imageability expansion. We have evaluated these systems in a cross-validation experiment using 8,188 norms found in existing pscholinguistics literature. We have also validated the quality of these combined norms by performing a small study using Amazon Mechanical Turk (AMT).

[1]  David Sánchez,et al.  Ontology-based information content computation , 2011, Knowl. Based Syst..

[2]  Danielle S. McNamara,et al.  Text simplification and comprehensible input: A case for an intuitive approach , 2012 .

[3]  Allan Paivio,et al.  Extensions of the Paivio, Yuille, and Madigan (1968) norms , 2004, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[4]  Yair Neuman,et al.  Literal and Metaphorical Sense Identification through Concrete and Abstract Context , 2011, EMNLP.

[5]  David Kauchak,et al.  Learning to Simplify Sentences Using Wikipedia , 2011, Monolingual@ACL.

[6]  Shi Feng,et al.  Simulating Human Ratings on Word Concreteness , 2011, FLAIRS.

[7]  Mark A. Changizi Economically organized hierarchies in WordNet and the Oxford English Dictionary , 2008, Cognitive Systems Research.

[8]  Adam Jatowt,et al.  Estimating content concreteness for finding comprehensible documents , 2013, WSDM '13.

[9]  Michael J Cortese,et al.  Imageability ratings for 3,000 monosyllabic words , 2004, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[10]  Michael Friendly,et al.  The Toronto Word Pool: Norms for imagery, concreteness, orthographic variables, and grammatical usage for 1,080 words , 1982 .

[11]  Felix Hill,et al.  Concreteness and Corpora: A Theoretical and Practical Analysis , 2013 .

[12]  Dekang Lin,et al.  Automatic Retrieval and Clustering of Similar Words , 1998, ACL.

[13]  Qing Zeng-Treitler,et al.  A semantic and syntactic text simplification tool for health content. , 2010, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[14]  Yi Zhang,et al.  Query Difficulty Prediction for Contextual Image Retrieval , 2010, ECIR.

[15]  Michael Wilson MRC Psycholinguistic Database , 2001 .

[16]  Bryan Rink,et al.  A Tiered Approach to the Recognition of Metaphor , 2014, CICLing.

[17]  Oi Yee Kwong,et al.  A Preliminary Study on the Impact of Lexical Concreteness on Word Sense Disambiguation , 2008, PACLIC.

[18]  Marilyn A. Walker,et al.  Using Linguistic Cues for the Automatic Recognition of Personality in Conversation and Text , 2007, J. Artif. Intell. Res..

[19]  Tomek Strzalkowski,et al.  Using Imageability and Topic Chaining to Locate Metaphors in Linguistic Corpora , 2013, SBP.

[20]  A. Paivio,et al.  Concreteness, imagery, and meaningfulness values for 925 nouns. , 1968, Journal of experimental psychology.

[21]  Yves Bestgen,et al.  Checking and bootstrapping lexical norms by means of word similarity indexes , 2012, Behavior Research Methods.

[22]  Michael J. Cortese,et al.  Imageability estimates for 3,000 disyllabic words , 2011, Behavior Research Methods.

[23]  Graeme Hirst,et al.  Hybrid Models for Lexical Acquisition of Correlated Styles , 2013, IJCNLP.