Do Language Models Know the Way to Rome?

The global geometry of language models is important for a range of applications, but language model probes tend to evaluate rather local relations, for which ground truths are easily obtained. In this paper we exploit the fact that in geography, ground truths are available beyond local relations. In a series of experiments, we evaluate the extent to which language model representations of city and country names are isomorphic to real-world geography, e.g., if you tell a language model where Paris and Berlin are, does it know the way to Rome? We find that language models generally encode limited geographic information, but with larger models performing the best, suggesting that geographic knowledge can be induced from higher-order co-occurrence statistics.

[1]  Georgiana Dinu,et al.  Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors , 2014, ACL.

[2]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[3]  Anders Søgaard,et al.  On the Limitations of Unsupervised Bilingual Dictionary Induction , 2018, ACL.

[4]  Claire Cardie,et al.  Interpreting Pretrained Contextualized Representations via Reductions to Static Embeddings , 2020, ACL.

[5]  Graham Neubig,et al.  How Can We Know What Language Models Know? , 2019, Transactions of the Association for Computational Linguistics.

[6]  Yejin Choi,et al.  Do Neural Language Representations Learn Physical Commonsense? , 2019, CogSci.

[7]  Laure Thompson,et al.  The strange geometry of skip-gram with negative sampling , 2017, EMNLP.

[8]  Kawin Ethayarajh,et al.  How Contextual are Contextualized Word Representations? Comparing the Geometry of BERT, ELMo, and GPT-2 Embeddings , 2019, EMNLP.

[9]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[10]  Marie-Francine Moens,et al.  Is an Image Worth More than a Thousand Words? On the Fine-Grain Semantic Differences between Visual and Linguistic Representations , 2016, COLING.

[11]  Anders Sogaard,et al.  Are All Good Word Vector Spaces Isomorphic? , 2020, EMNLP.

[12]  Felix Hill,et al.  SimLex-999: Evaluating Semantic Models With (Genuine) Similarity Estimation , 2014, CL.

[13]  Anders Sogaard,et al.  Can Language Models Encode Perceptual Structure Without Grounding? A Case Study in Color , 2021, CONLL.

[14]  Dipanjan Das,et al.  BERT Rediscovers the Classical NLP Pipeline , 2019, ACL.

[15]  Eneko Agirre,et al.  A Study on Similarity and Relatedness Using Distributional and WordNet-based Approaches , 2009, NAACL.

[16]  Alexander M. Rush,et al.  Commonsense Knowledge Mining from Pretrained Models , 2019, EMNLP.

[17]  Hinrich Schütze,et al.  Intrinsic Subspace Evaluation of Word Embedding Representations , 2016, ACL.

[18]  Willem Zuidema,et al.  Blackbox Meets Blackbox: Representational Similarity & Stability Analysis of Neural Language Models and Brains , 2019, BlackboxNLP@ACL.

[19]  Mostafa Abdou,et al.  MGAD: Multilingual Generation of Analogy Datasets , 2018, LREC.

[20]  Colin Raffel,et al.  How Much Knowledge Can You Pack into the Parameters of a Language Model? , 2020, EMNLP.

[21]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[22]  H. Schütze,et al.  Dimensions of meaning , 1992, Supercomputing '92.

[23]  Sebastian Riedel,et al.  Language Models as Knowledge Bases? , 2019, EMNLP.

[24]  Jon Gauthier,et al.  Are Distributional Representations Ready for the Real World? Evaluating Word Vectors for Grounded Perceptual Meaning , 2017, RoboNLP@ACL.

[25]  Danqi Chen,et al.  of the Association for Computational Linguistics: , 2001 .

[26]  Satoshi Matsuoka,et al.  Word Embeddings, Analogies, and Machine Learning: Beyond king - man + woman = queen , 2016, COLING.

[27]  John Hewitt,et al.  Designing and Interpreting Probes with Control Tasks , 2019, EMNLP.

[28]  Martin Wattenberg,et al.  Visualizing and Measuring the Geometry of BERT , 2019, NeurIPS.

[29]  Roy Schwartz,et al.  How Well Do Distributional Models Capture Different Types of Semantic Knowledge? , 2015, ACL.

[30]  Gemma Boleda,et al.  Distributional Semantics in Technicolor , 2012, ACL.