Word sense disambiguation: an evaluation study of semi-supervised approaches with word embeddings

Word Sense Disambiguation (WSD) is a well-known problem in the field of Natural Language Processing (NLP) related to automatically determining the most appropriate sense of words in context. Several machine learning-based approaches have been proposed to tackle the ambiguity of language, but the lack of labeled data to train supervised models made semi-supervised learning (SSL) appear as an attractive option. Furthermore, the use of word embeddings to enhance the results of NLP tasks was shown to be an efficient strategy. Thus, this paper aims at adapting semi-supervised algorithms for WSD using word embeddings from Word2Vec, FastText, and BERT models combined with part-of-speech tags as input. We conduct a systematic evaluation of four graph-based SSL models analyzing the influence of their hyperparameters on the results, as well as the distances to build the graphs, the percentages of labeled data, and the word embeddings architectural variations. As a result, we show that SSL algorithms which received 10% of labeled data are strong baselines on the subsets of nouns and adjectives. Additionally, these algorithms do not need further training to disambiguate new words, hence being competitive to supervised systems.

[1]  Hwee Tou Ng,et al.  Word Sense Disambiguation with Semi-Supervised Learning , 2005, AAAI.

[2]  Hinrich Schütze,et al.  AutoExtend: Extending Word Embeddings to Embeddings for Synsets and Lexemes , 2015, ACL.

[3]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[4]  Martha Palmer,et al.  SemEval-2007 Task-17: English Lexical Sample, SRL and All Words , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[5]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[6]  Christos Faloutsos,et al.  CAMLP: Confidence-Aware Modulated Label Propagation , 2016, SDM.

[7]  Adam Kilgarriff,et al.  The Senseval-3 English lexical sample task , 2004, SENSEVAL@ACL.

[8]  Katrin Kirchhoff,et al.  Data-Driven Graph Construction for Semi-Supervised Graph-Based Learning in NLP , 2007, NAACL.

[9]  Ryan Doherty,et al.  Semi-supervised Word Sense Disambiguation with Neural Models , 2016, COLING.

[10]  Alneu de Andrade Lopes,et al.  Word sense disambiguation: A complex network approach , 2018, Inf. Sci..

[11]  Alneu de Andrade Lopes,et al.  A Comparison of Graph Construction Methods for Semi-Supervised Learning , 2018, 2018 International Joint Conference on Neural Networks (IJCNN).

[12]  Roberto Navigli,et al.  Word sense disambiguation: A survey , 2009, CSUR.

[13]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[14]  Hwee Tou Ng,et al.  Semi-Supervised Word Sense Disambiguation Using Word Embeddings in General and Specific Domains , 2015, NAACL.

[15]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[16]  Christos Faloutsos,et al.  OMNI-Prop: Seamless Node Classification on Arbitrary Label Correlation , 2015, AAAI.

[17]  José Camacho-Collados,et al.  From Word to Sense Embeddings: A Survey on Vector Representations of Meaning , 2018, J. Artif. Intell. Res..

[18]  Hwee Tou Ng,et al.  It Makes Sense: A Wide-Coverage Word Sense Disambiguation System for Free Text , 2010, ACL.

[19]  Christian Biemann,et al.  Making Sense of Word Embeddings , 2016, Rep4NLP@ACL.

[20]  Shashi Pal Singh,et al.  Machine translation using deep learning: An overview , 2017, 2017 International Conference on Computer, Communications and Electronics (Comptelix).

[21]  George A. Miller,et al.  Using a Semantic Concordance for Sense Identification , 1994, HLT.

[22]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[23]  Ignacio Iacobacci,et al.  Embeddings for Word Sense Disambiguation: An Evaluation Study , 2016, ACL.

[24]  Zoubin Ghahramani,et al.  Learning from labeled and unlabeled data with label propagation , 2002 .

[25]  Scott Cotton,et al.  SENSEVAL-2: Overview , 2001, *SEMEVAL.

[26]  Tomas Mikolov,et al.  Bag of Tricks for Efficient Text Classification , 2016, EACL.