Adapting Entities across Languages and Cultures

How would you explain Bill Gates to a German? He is associated with founding a company in the United States, so perhaps the German founder Carl Benz could stand in for Gates in those contexts. This type of translation is called adaptation in the translation community (Vinay and Darbelnet, 1995). Until now, this task has not been done computationally. Automatic adaptation could be used in natural language processing for machine translation and indirectly for generating new question answering datasets and education. We propose two automatic methods and compare them to human results for this novel NLP task. First, a structured knowledge base adapts named entities using their shared properties. Second, vector arithmetic and orthogonal embedding mappings identify better candidates, but at the expense of interpretable features. We evaluate our methods through a new dataset1 of human adaptations. 1 When Translation Misses the Mark Imagine reading a translation from German, “I saw Merkel eating a Berliner from Dietsch on the ICE”. This sentence is opaque without cultural context. An extreme cultural adaptation for an American audience could render the sentence as “I saw Biden eating a Boston Cream from Dunkin’ Donuts on the Acela”, elucidating that Merkel is in a similar political post to Biden; that Dietsch (like Dunkin’ Donuts) is a mid-range purveyor of baked goods; both Berliners and Boston Creams are filled, sweet pastries named after a city; and ICE and Acela are slightly ritzier high-speed trains. Human translators make this adaptation when it is appropriate to the translation (Gengshen, 2003). Available at https://go.umd.edu/adaptation Bill Gates Top Adaptations: WikiData 3CosAdd Human F. Zeppelin congstar A. Bechtolsheim Günther Jauch Alnatura Dietmar Hopp N. Harnoncourt GMX Carl Benz Table 1: WikiData and unsupervised embeddings (3CosAdd) generate adaptations of an entity, such as Bill Gates. Human adaptations are gathered for evaluation. American and German entities are color coded. Because adaptation is understudied, we leave the full translation task to future work. Instead, we focus on the task of cultural adaptation of entities: given an entity in a source, what is the corresponding entity in English? Most Americans would not recognize Christian Drosten, but the most efficient explanation to an American would be to say that he is the “German Anthony Fauci” (Loh, 2020). We provide top adaptations suggested by algorithms and humans for another American involved with the pandemic response, Bill Gates, in Table 1. Can machines reliably find these analogs with minimal supervision? We generate these adaptations with structured knowledge bases (Section 3) and word embeddings (Section 4). We elicit human adaptations (Section 5) to evaluate whether our automatic adaptations are plausible (Section 5.3). 2 Wer ist Bill Gates? We define cultural adaptation and motivate its application for tasks like creating culturally-centered training data for QA. Vinay and Darbelnet (1995) define adaptation as translation in which the relationship not the literal meaning between the receiver and the content needs to be recreated. You could formulate our task as a traditional analogy Drosten::Germany as Fauci::United States (Turney, 2008; Gladkova et al., 2016), but despite this superficial resemblance (explored in Section 4), traditional approaches to analogy ignore the influence of culture and are typically within a language. Hence, analogies are tightly bound with culture; humans struggle with analogies outside their culture (Freedle, 2003). We can use this task to identify named entities (Kasai et al., 2019; Arora et al., 2019; Jain et al., 2019) and for understanding other cultures (Katan and Taibi, 2004). 2.1 . . . and why Bill Gates? This task requires a list of named entities adaptable to other cultures. Our entities come from two sources: a subset of the top 500 most visited German/English Wikipedia pages and the nonofficial characterization list (Veale, 2016, NOC), “a source of stereotypical knowledge regarding popular culture, famous people (real and fictional) and their trade-mark qualities, behaviours and settings”. Wikipedia contains a plethora of singers and actors; we filter the top 500 pages to avoid a pop culture skew.2 We additionally select all Germans and a subset of Americans from the Veale NOC list as it is human-curated, verified, and contains a broader historical period than popular Wikipedia pages. Like other semantic relationships (Boyd-Graber et al., 2006), this is not symmetric. Thus, we adapt entities in both directions; while Berlin is the German Washington, DC, there is less consensus on what is the American Berlin, as Berlin is both the capital, a tech hub, and a film hub. A full list of our entities is provided in Appendix D. 3 Adaptation from a Knowledge Base We first adapt entities with a knowledge base. We use WikiData (Vrandečić and Krötzsch, 2014), a structured, human-annotated representation of Wikipedia entities that is actively developed. This resource is well-suited to the task as features are standardized both within and across languages. Many knowledge bases explicitly encode the nationality of individuals, places, and creative works. Entities in the knowledge base are a discrete sparse vector, where most dimensions are unknown or not applicable (e.g., a building does not have a spouse). We discuss the applicability of using Wikipedia (i.e., what proportion of the English Wikipedia is visited from the United States) in Appendix B. For example, Angela Merkel is a human (instance of), German (country of citizenship), politician (occupation), Rotarian (member of), Lutheran (religion), 1.65 meters tall (height), and has a PhD (academic degree). How would we find the “most similar” American adaptation to Angela Merkel? Intuitively, we should find someone whose nation-

[1]  Ellen M. Voorhees,et al.  The TREC-8 Question Answering Track Report , 1999, TREC.

[2]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[3]  Jordan Boyd-Graber,et al.  Towards Deconfounding the Influence of Subject's Demographic Characteristics in Question Answering , 2021, ArXiv.

[4]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[5]  Omer Levy,et al.  Linguistic Regularities in Sparse and Explicit Word Representations , 2014, CoNLL.

[6]  Eneko Agirre,et al.  A robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings , 2018, ACL.

[7]  Hagen Schulze The Course of German Nationalism: From Frederick the Great to Bismarck 1763-1867 , 1991 .

[8]  Viktor Hangya,et al.  Unsupervised Parallel Sentence Extraction with Parallel Segment Detection Helps Machine Translation , 2019, ACL.

[9]  Jeff Johnson,et al.  Billion-Scale Similarity Search with GPUs , 2017, IEEE Transactions on Big Data.

[10]  Jordan L. Boyd-Graber,et al.  Adding dense, weighted connections to WordNet , 2005 .

[11]  Satoshi Matsuoka,et al.  Analogy-based detection of morphological and semantic relations with word embeddings: what works and what doesn’t. , 2016, NAACL.

[12]  Peter D. Turney A Uniform Approach to Analogies, Synonyms, Antonyms, and Associations , 2008, COLING.

[13]  David Katan,et al.  Translating Cultures: An Introduction for Translators, Interpreters and Mediators , 2014 .

[14]  Sabine Schulte im Walde,et al.  Improving Zero-Shot-Learning for German Particle Verbs by using Training-Space Restrictions and Local Scaling , 2016, *SEM@ACL.

[15]  Shi Feng,et al.  Trick Me If You Can: Human-in-the-loop Generation of Adversarial Question Answering Examples , 2019, Trans. Assoc. Comput. Linguistics.

[16]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[17]  Guillaume Lample,et al.  Word Translation Without Parallel Data , 2017, ICLR.

[18]  Sebastian Riedel,et al.  MLQA: Evaluating Cross-lingual Extractive Question Answering , 2019, ACL.

[19]  Zachary C. Lipton,et al.  Entity Projection via Machine Translation for Cross-Lingual NER , 2019, EMNLP.

[20]  David Robinson,et al.  Das Cabinet des Dr. Caligari , 1997 .

[21]  Hermann Ney,et al.  Effective Cross-lingual Transfer of Neural Machine Translation Models without Shared Vocabularies , 2019, ACL.

[22]  Quoc V. Le,et al.  Exploiting Similarities among Languages for Machine Translation , 2013, ArXiv.

[23]  Jungo Kasai,et al.  Low-resource Deep Entity Resolution with Transfer and Active Learning , 2019, ACL.

[24]  Bofang Li,et al.  The (too Many) Problems of Analogical Reasoning with Word Vectors , 2017, *SEMEVAL.

[25]  Mona T. Diab,et al.  Multi-Domain Goal-Oriented Dialogues (MultiDoGO): Strategies toward Curating and Annotating Large Scale Dialogue Data , 2019, EMNLP.

[26]  Mikel Artetxe,et al.  On the Cross-lingual Transferability of Monolingual Representations , 2019, ACL.

[27]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[28]  T. Veale Round Up The Usual Suspects: Knowledge-Based Metaphor Generation , 2016 .

[29]  Hu Gengshen,et al.  Translation as adaptation and selection , 2003 .

[30]  Jean-Paul Vinay,et al.  Comparative stylistics of French and English : a methodology for translation , 1995 .

[31]  Roberto Navigli,et al.  SemEval-2014 Task 3: Cross-Level Semantic Similarity , 2014, *SEMEVAL.

[32]  R. Freedle Correcting the SAT's ethnic and social-class bias: A method for reestimating SAT scores. , 2003 .

[33]  Markus Krötzsch,et al.  Wikidata , 2014, Commun. ACM.