Why is penguin more similar to polar bear than to sea gull? Analyzing conceptual knowledge in distributional models

What do powerful models of word mean- ing created from distributional data (e.g. Word2vec (Mikolov et al., 2013) BERT (Devlin et al., 2019) and ELMO (Peters et al., 2018)) represent? What causes words to be similar in the semantic space? What type of information is lacking? This thesis proposal presents a framework for investigating the information encoded in distributional semantic models. Several analysis methods have been suggested, but they have been shown to be limited and are not well understood. This approach pairs observations made on actual corpora with insights obtained from data manipulation experiments. The expected outcome is a better understanding of (1) the semantic information we can infer purely based on linguistic co-occurrence patterns and (2) the potential of distributional semantic models to pick up linguistic evidence.

[1]  Antske Fokkens,et al.  Towards interpretable, data-derived distributional meaning representations for reasoning: A dataset of properties and concepts , 2019, GWC.

[2]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[3]  Paul Miller,et al.  Representation of Word Meaning in the Intermediate Projection Layer of a Neural Language Model , 2018, BlackboxNLP@EMNLP.

[4]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[5]  J. Gibson The visual perception of objective motion and subjective movement. , 1994, Psychological review.

[6]  Douwe Kiela,et al.  No Training Required: Exploring Random Encoders for Sentence Classification , 2019, ICLR.

[7]  Yejin Choi,et al.  Do Neural Language Representations Learn Physical Commonsense? , 2019, CogSci.

[8]  Willem H. Zuidema,et al.  Visualisation and 'diagnostic classifiers' reveal how recurrent and recursive neural networks process hierarchical structure , 2017, J. Artif. Intell. Res..

[9]  Catherine Havasi,et al.  ConceptNet 5.5: An Open Multilingual Graph of General Knowledge , 2016, AAAI.

[10]  Tony Veale,et al.  The Agile Cliché: Using Flexible Stereotypes as Building Blocks in the Construction of an Affective Lexicon , 2013, New Trends of Research in Ontologies and Lexical Resources.

[11]  Roy Schwartz,et al.  How Well Do Distributional Models Capture Different Types of Semantic Knowledge? , 2015, ACL.

[12]  Katrin Erk,et al.  What do you know about an alligator when you know the company it keeps , 2016 .

[13]  Jascha Sohl-Dickstein,et al.  SVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability , 2017, NIPS.

[14]  Adam Lopez,et al.  Understanding Learning Dynamics Of Language Models with SVCCA , 2018, NAACL.

[15]  David P Vinson,et al.  Semantic feature production norms for a large set of objects and events , 2008, Behavior research methods.

[16]  Stephen Clark,et al.  From distributional semantics to feature norms: grounding semantic models in human perceptual data , 2015, IWCS.

[17]  Allan Collins,et al.  Facilitating retrieval from semantic memory: The effect of repeating part of an inference , 1970 .

[18]  Eneko Agirre,et al.  Probing for Semantic Classes: Diagnosing the Meaning Content of Word Embeddings , 2019, ACL.

[19]  Akira Utsumi,et al.  Computational Exploration of Metaphor Comprehension Processes Using a Semantic Space Model , 2011, Cogn. Sci..

[20]  Wanxiang Che,et al.  Learning Semantic Hierarchies via Word Embeddings , 2014, ACL.

[21]  Núria Bel,et al.  A Word-Embedding-based Sense Index for Regular Polysemy Representation , 2015, VS@HLT-NAACL.

[22]  Lora Aroyo,et al.  Capturing Ambiguity in Crowdsourcing Frame Disambiguation , 2018, HCOMP.

[23]  Mark S. Seidenberg,et al.  Semantic feature production norms for a large set of living and nonliving things , 2005, Behavior research methods.

[24]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[25]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[26]  Samuel R. Bowman,et al.  Language Modeling Teaches You More than Translation Does: Lessons Learned Through Auxiliary Syntactic Task Analysis , 2018, BlackboxNLP@EMNLP.

[27]  Hinrich Schütze,et al.  Intrinsic Subspace Evaluation of Word Embedding Representations , 2016, ACL.

[28]  Yonatan Belinkov,et al.  Analysis Methods in Neural Language Processing: A Survey , 2018, TACL.

[29]  Antske Fokkens,et al.  Firearms and Tigers are Dangerous, Kitchen Knives and Zebras are Not: Testing whether Word Embeddings Can Tell , 2018, BlackboxNLP@EMNLP.

[30]  Siobhan Chapman Logic and Conversation , 2005 .

[31]  Robert Dale,et al.  Computational Interpretations of the Gricean Maxims in the Generation of Referring Expressions , 1995, Cogn. Sci..

[32]  Jonathan Berant,et al.  CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge , 2019, NAACL.

[33]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[34]  Yanfen Hao,et al.  Learning to Understand Figurative Language: From Similes to Metaphors to Irony , 2007 .

[35]  Vera Demberg,et al.  LingoTurk: managing crowdsourced tasks for psycholinguistics , 2016, NAACL.

[36]  Jeroen Geertzen,et al.  The Centre for Speech, Language and the Brain (CSLB) concept property norms , 2013, Behavior research methods.

[37]  Aurélie Herbelot,et al.  Building a shared world: mapping distributional to model-theoretic semantic spaces , 2015, EMNLP.

[38]  Tony Veale,et al.  Creative Language Retrieval: A Robust Hybrid of Information Retrieval and Linguistic Creativity , 2011, ACL.

[39]  Gemma Boleda,et al.  Distributional vectors encode referential attributes , 2015, EMNLP.

[40]  M. Garrett,et al.  Representing the meanings of object and action words: The featural and unitary semantic space hypothesis , 2004, Cognitive Psychology.

[41]  Guillaume Lample,et al.  Evaluation of Word Vector Representations by Subspace Alignment , 2015, EMNLP.

[42]  Catherine Havasi,et al.  Representing General Relational Knowledge in ConceptNet 5 , 2012, LREC.

[43]  A. Glenberg,et al.  Symbol Grounding and Meaning: A Comparison of High-Dimensional and Embodied Theories of Meaning , 2000 .

[44]  Michael N. Jones,et al.  Redundancy in Perceptual and Linguistic Experience: Comparing Feature-Based and Distributional Models of Semantic Representation , 2010, Top. Cogn. Sci..

[45]  Stanley B Klein,et al.  What memory is. , 2015, Wiley interdisciplinary reviews. Cognitive science.

[46]  Aurélie Herbelot What is in a text, what isn't, and what this has to do with lexical semantics , 2013, IWCS.

[47]  Yulia Tsvetkov,et al.  Correlation-based Intrinsic Evaluation of Word Vector Representations , 2016, RepEval@ACL.