When is Wall a Pared and when a Muro?: Extracting Rules Governing Lexical Selection

Learning fine-grained distinctions between vocabulary items is a key challenge in learning a new language. For example, the noun “wall” has different lexical manifestations in Spanish – “pared” refers to an indoor wall while “muro” refers to an outside wall. However, this variety of lexical distinction may not be obvious to non-native learners unless the distinction is explained in such a way. In this work, we present a method for automatically identifying fine-grained lexical distinctions, and extracting rules explaining these distinctions in a human- and machine-readable format. We confirm the quality of these extracted rules in a language learning setup for two languages, Spanish and Greek, where we use the rules to teach non-native speakers when to translate a given ambiguous word into its different possible translations.

[1]  Nick C. Ellis,et al.  Sequencing in SLA , 1996, Studies in Second Language Acquisition.

[2]  Francis M. Tyers,et al.  Flexible finite-state lexical selection for rule-based machine translation , 2012, EAMT.

[3]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[4]  Zena Moore Foreign Language Teacher Education: Multiple Perspectives , 1996 .

[5]  Jörg Tiedemann,et al.  Parallel Data, Tools and Interfaces in OPUS , 2012, LREC.

[6]  Chris Callison-Burch,et al.  Learning Translations via Images with a Massively Multilingual Image Dataset , 2018, ACL.

[7]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[8]  R. Mclean,et al.  A Unified Approach to Mixed Linear Models , 1991 .

[9]  Christopher D. Manning,et al.  Stanza: A Python Natural Language Processing Toolkit for Many Human Languages , 2020, ACL.

[10]  J. Hulstijn,et al.  Incidental Vocabulary Learning by Advanced Foreign Language Students: The Influence of Marginal Glosses, Dictionary Use, and Reoccurrence of Unknown Words , 1996 .

[11]  Marine Carpuat,et al.  How phrase sense disambiguation outperforms word sense disambiguation for statistical machine translation , 2007, TMI.

[12]  Laura Mascarell,et al.  Improving Word Sense Disambiguation in Neural Machine Translation with Sense Embeddings , 2017, WMT.

[13]  Frankie Robertson,et al.  Show, Don’t Tell: Visualising Finnish Word Formation in a Browser-Based Reading Assistant , 2020, NLP4CALL.

[14]  David Singleton,et al.  Exploring the second language mental lexicon , 1999 .

[15]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[16]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[17]  Peter J. M. Groot,et al.  Computer Assisted Second Language Vocabulary Acquisition , 2000 .

[18]  Leonardo Zilio,et al.  Using NLP for Enhancing Second Language Acquisition , 2017, RANLP.

[19]  Rico Sennrich,et al.  The Word Sense Disambiguation Test Suite at WMT18 , 2018, WMT.

[20]  Walt Detmar Meurers,et al.  Enhancing Authentic Web Pages for Language Learners , 2010 .

[21]  Wilson L. Taylor,et al.  “Cloze Procedure”: A New Tool for Measuring Readability , 1953 .

[22]  Willem Zuidema,et al.  Quantifying Attention Flow in Transformers , 2020, ACL.

[23]  Ewan Klein,et al.  Natural Language Processing with Python , 2009 .

[24]  Phillip Rowles Teaching and Learning Vocabulary , 2003 .

[25]  Roberto Navigli,et al.  Breaking Through the 80% Glass Ceiling: Raising the State of the Art in Word Sense Disambiguation by Incorporating Knowledge Graph Information , 2020, ACL.

[26]  Graham Neubig,et al.  Word Alignment by Fine-tuning Embeddings on Parallel Corpora , 2021, EACL.

[27]  Yuichi Watanabe INPUT, INTAKE, AND RETENTION , 1997, Studies in Second Language Acquisition.

[28]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[29]  Emily Moline Indigenous Language Teaching Policy in California/the U.S.: What’s Left Unsaid in Discourse/Funding , 2020, Issues in Applied Linguistics.

[30]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[31]  Jörg Tiedemann,et al.  OpenSubtitles2016: Extracting Large Parallel Corpora from Movie and TV Subtitles , 2016, LREC.

[32]  Marine Carpuat,et al.  Improving Statistical Machine Translation Using Word Sense Disambiguation , 2007, EMNLP.