MWSA Task at GlobaLex 2020: RACAI's Word Sense Alignment System using a Similarity Measurement of Dictionary Definitions

This paper describes RACAI’s word sense alignment system, which participated in the Monolingual Word Sense Alignment shared task organized at GlobaLex 2020 workshop. We discuss the system architecture, some of the challenges that we faced as well as present our results on several of the languages available for the task.

[1]  Verginica Barbu Mititelu,et al.  The Reference Corpus of the Contemporary Romanian Language (CoRoLa) , 2018, LREC.

[2]  Eva Schlinger,et al.  How Multilingual is Multilingual BERT? , 2019, ACL.

[3]  Radu Ion,et al.  Romanian WordNet : Current State , New Applications and Prospects , 2008 .

[4]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[5]  Ralph Grishman,et al.  The American National Corpus: A Standardized Resource for American English , 2000, LREC.

[6]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[7]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[8]  R. Darnell Translation , 1873, The Indian medical gazette.

[9]  Iryna Gurevych,et al.  Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks , 2019, EMNLP.

[10]  Dan Tufiş,et al.  COMPUTING DISTRIBUTED REPRESENTATIONS OF WORDS USING THE COROLA CORPUS , 2018 .

[11]  Sussi Olsen,et al.  A Multilingual Evaluation Dataset for Monolingual Word Sense Alignment , 2020, LREC.

[12]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[13]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[14]  Prakhar Gupta,et al.  Learning Word Vectors for 157 Languages , 2018, LREC.

[15]  Nizar Habash,et al.  A Categorial Variation Database for English , 2003, NAACL.

[16]  Holger Schwenk,et al.  Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond , 2018, Transactions of the Association for Computational Linguistics.

[17]  John C. Mallery Thinking About Foreign Policy: Finding an Appropriate Role for Artificially Intelligent Computers , 1988 .

[18]  Michael Moortgat,et al.  Syntactic Annotation for the Spoken Dutch Corpus Project (CGN) , 2000, CLIN.

[19]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[20]  Vito Pirrelli,et al.  The PAISÀ Corpus of Italian Web Texts , 2014, WaC@EACL.

[21]  M. de Rijke,et al.  Short Text Similarity with Word Embeddings , 2015, CIKM.

[22]  Ronald L. Rivest,et al.  Learning decision lists , 2004, Machine Learning.

[23]  David Yarowsky,et al.  Hierarchical Decision Lists for Word Sense Disambiguation , 2000, Comput. Humanit..

[24]  Diana Inkpen,et al.  Semantic text similarity using corpus-based word similarity and string similarity , 2008, ACM Trans. Knowl. Discov. Data.

[25]  J. Martin Rochester Thinking About Foreign Policy , 2018 .

[26]  Michael E. Lesk,et al.  Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone , 1986, SIGDOC '86.

[27]  Roberto Navigli,et al.  Word sense disambiguation: A survey , 2009, CSUR.