Semi-supervised Learning with Induced Word Senses for State of the Art Word Sense Disambiguation

Word Sense Disambiguation (WSD) aims to determine the meaning of a word in context, and successful approaches are known to benefit many applications in Natural Language Processing. Although supervised learning has been shown to provide superior WSD performance, current sense-annotated corpora do not contain a sufficient number of instances per word type to train supervised systems for all words. While unsupervised techniques have been proposed to overcome this data sparsity problem, such techniques have not outperformed supervised methods. In this paper, we propose a new approach to building semi-supervised WSD systems that combines a small amount of sense-annotated data with information from Word Sense Induction, a fully-unsupervised technique that automatically learns the different senses of a word based on how it is used. In three experiments, we show how sense induction models may be effectively combined to ultimately produce high-performance semi-supervised WSD systems that exceed the performance of state-of-the-art supervised WSD techniques trained on the same sense-annotated data. We anticipate that our results and released software will also benefit evaluation practices for sense induction systems and those working in low-resource languages by demonstrating how to quickly produce accurate WSD systems with minimal annotation effort.

[1]  Elie Bienenstock,et al.  Sphere Embedding: An Application to Part-of-Speech Induction , 2010, NIPS.

[2]  Kenneth Ward Church,et al.  Work on Statistical Methods for Word Sense Disambiguation , 1992 .

[3]  W. Marslen-Wilson,et al.  Making Sense of Semantic Ambiguity: Semantic Competition in Lexical Access , 2002 .

[4]  Ted Pedersen Duluth-WSI: SenseClusters Applied to the Sense Induction Task of SemEval-2 , 2010, SemEval@ACL.

[5]  Keith Stevens,et al.  Measuring the Impact of Sense Similarity on Word Sense Induction , 2011, ULNLP@EMNLP.

[6]  Suresh Manandhar,et al.  SemEval-2010 Task 14: Word Sense Induction &Disambiguation , 2010, SemEval@ACL.

[7]  Cheng Niu,et al.  Word Independent Context Pair Classification Model for Word Sense Disambiguation , 2005, CoNLL.

[8]  Bob Carpenter,et al.  The Benefits of a Model of Annotation , 2013, Transactions of the Association for Computational Linguistics.

[9]  Enis Sert,et al.  AI-KU: Using Substitute Vectors and Co-Occurrence Modeling For Word Sense Induction and Disambiguation , 2013, SemEval@NAACL-HLT.

[10]  Roberto Navigli A Quick Tour of Word Sense Disambiguation, Induction and Related Approaches , 2012, SOFSEM.

[11]  Eneko Agirre,et al.  Crowdsourced Word Sense Annotations and Difficult Words and Examples , 2015, IWCS.

[12]  Zhimao Lu,et al.  An Equivalent Pseudoword Solution to Chinese Word Sense Disambiguation , 2006, ACL.

[13]  Silvia Bernardini,et al.  Introducing and evaluating ukWaC , a very large web-derived corpus of English , 2008 .

[14]  Christian Biemann,et al.  Chinese Whispers - an Efficient Graph Clustering Algorithm and its Application to Natural Language Processing Problems , 2006 .

[15]  Roberto Navigli,et al.  SemEval-2007 Task 07: Coarse-Grained English All-Words Task , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[16]  Pavel Smrz,et al.  A New Approach to Pseudoword Generation , 2010, LREC.

[17]  Ted Pedersen,et al.  A Simple Approach to Building Ensembles of Naive Bayesian Classifiers for Word Sense Disambiguation , 2000, ANLP.

[18]  Roberto Navigli,et al.  Entity Linking meets Word Sense Disambiguation: a Unified Approach , 2014, TACL.

[19]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[20]  Haizhou Li,et al.  Pseudo-Word for Phrase-Based Machine Translation , 2010, ACL.

[21]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[22]  Roberto Navigli,et al.  A Large-Scale Pseudoword-Based Evaluation Framework for State-of-the-Art Word Sense Disambiguation , 2014, CL.

[23]  Hwee Tou Ng,et al.  Word Sense Disambiguation Improves Statistical Machine Translation , 2007, ACL.

[24]  Marianna Apidianaki,et al.  Latent Semantic Word Sense Induction and Disambiguation , 2011, ACL.

[25]  Dong-Hong Ji,et al.  Word Sense Disambiguation Using Label Propagation Based Semi-Supervised Learning , 2005, ACL.

[26]  Julie Weeds,et al.  Finding Predominant Word Senses in Untagged Text , 2004, ACL.

[27]  Anders Søgaard,et al.  Robust Semi-supervised and Ensemble-Based Methods in Word Sense Disambiguation , 2010, IceTAL.

[28]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[29]  Iryna Gurevych,et al.  FrameNet on the Way to Babel: Creating a Bilingual FrameNet Using Wiktionary as Interlingual Connection , 2013, ACL.

[30]  Ido Dagan,et al.  Similarity-Based Models of Word Cooccurrence Probabilities , 1998, Machine Learning.

[31]  Roberto Navigli,et al.  SemEval-2013 Task 12: Multilingual Word Sense Disambiguation , 2013, *SEMEVAL.

[32]  Ted Pedersen,et al.  Word Sense Discrimination by Clustering Contexts in Vector and Similarity Spaces , 2004, CoNLL.

[33]  Hwee Tou Ng,et al.  Word Sense Disambiguation with Semi-Supervised Learning , 2005, AAAI.

[34]  Eneko Agirre,et al.  Semeval-2007 Task 2 : Evaluating Word Sense Induction and Discrimination , 2007 .

[35]  Mirella Lapata,et al.  Ensemble Methods for Unsupervised WSD , 2006, ACL.

[36]  Silvia Bernardini,et al.  The WaCky wide web: a collection of very large linguistically processed web-crawled corpora , 2009, Lang. Resour. Evaluation.

[37]  Rada Mihalcea,et al.  Co-training and Self-training for Word Sense Disambiguation , 2004, CoNLL.

[38]  Nancy Ide,et al.  Making Sense of Word Sense Variation , 2009, SEW@NAACL-HLT.

[39]  Mirella Lapata,et al.  Bayesian Word Sense Induction , 2009, EACL.

[40]  Jing Wang,et al.  A Sense-Topic Model for Word Sense Induction with Unsupervised Data Enrichment , 2015, TACL.

[41]  Adam Kilgarriff,et al.  How Dominant Is the Commonest Sense of a Word? , 2004, TSD.

[42]  Roberto Navigli,et al.  Inducing Word Senses to Improve Web Search Result Clustering , 2010, EMNLP.

[43]  Keith Stevens,et al.  Evaluating Unsupervised Ensembles when applied to Word Sense Induction , 2012, ACL 2012.

[44]  Mark Stevenson,et al.  Unsupervised Domain Tuning to Improve Word Sense Disambiguation , 2013, HLT-NAACL.

[45]  Graeme Hirst,et al.  Automatic identification of words with novel but infrequent senses , 2011, PACLIC.

[46]  Sandra Kübler,et al.  Semi-Supervised Learning for Word Sense Disambiguation: Quality vs. Quantity , 2009, RANLP.

[47]  Hwee Tou Ng,et al.  It Makes Sense: A Wide-Coverage Word Sense Disambiguation System for Free Text , 2010, ACL.

[48]  Jurij D. Apresjan REGULAR POLYSEMY , 1974 .

[49]  Treebank Penn,et al.  Linguistic Data Consortium , 1999 .

[50]  Francis Bond,et al.  Linking and Extending an Open Multilingual Wordnet , 2013, ACL.

[51]  Zhiyuan Liu,et al.  Topical Word Embeddings , 2015, AAAI.

[52]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[53]  Timothy Baldwin,et al.  unimelb: Topic Modelling-based Word Sense Induction , 2013, SemEval@NAACL-HLT.

[54]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[55]  Christopher Stokoe Differentiating Homonymy and Polysemy in Information Retrieval , 2005, HLT/EMNLP.

[56]  Jean Véronis,et al.  HyperLex: lexical cartography for information retrieval , 2004, Comput. Speech Lang..

[57]  H. Schütze,et al.  Dimensions of meaning , 1992, Supercomputing '92.

[58]  Hinrich Sch Automatic Word Sense Discrimination , 1998 .

[59]  Héctor Martínez Alonso Annotation of regular polysemy: an empirical assessment of the underspecified sense , 2013 .

[60]  Nancy Ide,et al.  Introduction to the Special Issue on Word Sense Disambiguation: The State of the Art , 1998, Comput. Linguistics.

[61]  Pushpak Bhattacharyya,et al.  All Words Domain Adapted WSD: Finding a Middle Ground between Supervision and Unsupervision , 2010, ACL.

[62]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[63]  Deniz Yuret,et al.  FASTSUBS: An Efficient and Exact Procedure for Finding the Most Likely Lexical Substitutes Based on an N-Gram Language Model , 2012, IEEE Signal Processing Letters.

[64]  Yulia Tsvetkov,et al.  Augmenting English Adjective Senses with Supersenses , 2014, LREC.

[65]  David Jurgens,et al.  SemEval-2013 Task 13: Word Sense Induction for Graded and Non-Graded Senses , 2013, SemEval@NAACL-HLT.

[66]  Hinrich Schütze,et al.  Automatic Word Sense Discrimination , 1998, Comput. Linguistics.

[67]  Suresh Manandhar,et al.  Word Sense Induction Disambiguation Using Hierarchical Random Graphs , 2010, EMNLP.

[68]  Simone Paolo Ponzetto,et al.  BabelNet: Building a Very Large Multilingual Semantic Network , 2010, ACL.

[69]  Deniz Yuret,et al.  KU: Word Sense Disambiguation by Substitution , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[70]  Roberto Navigli,et al.  Clustering and Diversifying Web Search Results with Graph-Based Word Sense Induction , 2013, CL.

[71]  Zhimao Lu,et al.  Combining Neural Networks and Statistics for Chinese Word Sense Disambiguation , 2004, SIGHAN@ACL.

[72]  Roberto Navigli,et al.  Paving the Way to a Large-scale Pseudosense-annotated Dataset , 2013, HLT-NAACL.

[73]  David Yarowsky,et al.  Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[74]  Eneko Agirre,et al.  On the Use of Automatically Acquired Examples for All-Nouns Word Sense Disambiguation , 2008, J. Artif. Intell. Res..

[75]  Mitchell P. Marcus,et al.  OntoNotes: The 90% Solution , 2006, NAACL.

[76]  George A. Miller,et al.  A Semantic Concordance , 1993, HLT.

[77]  David Yarowsky,et al.  Modeling Consensus: Classifier Combination for Word Sense Disambiguation , 2002, EMNLP.

[78]  Timothy Baldwin,et al.  Word Sense Induction for Novel Sense Detection , 2012, EACL.

[79]  M. A. R T H A P A L,et al.  Making fine-grained and coarse-grained sense distinctions , both manually and automatically , 2005 .

[80]  David Jurgens,et al.  An Evaluation of Graded Sense Disambiguation using Word Sense Induction , 2012, *SEMEVAL.

[81]  Carlo Strapparava,et al.  The role of domain information in Word Sense Disambiguation , 2002, Natural Language Engineering.

[82]  Preslav Nakov,et al.  Category-based Pseudowords , 2003, HLT-NAACL.

[83]  Eneko Agirre,et al.  Random Walks for Knowledge-Based Word Sense Disambiguation , 2014, CL.

[84]  Roberto Navigli,et al.  Word sense disambiguation: A survey , 2009, CSUR.

[85]  Tanja Gaustad,et al.  Statistical Corpus-Based Word Sense Disambiguation: Pseudowords vs. Real Ambiguous Words , 2001, ACL.

[86]  Eneko Agirre,et al.  Evaluating and optimizing the parameters of an unsupervised graph-based WSD algorithm , 2006 .

[87]  Karin M. Verspoor,et al.  What Can We Get From 1000 Tokens? A Case Study of Multilingual POS Tagging For Resource-Poor Languages , 2014, EMNLP.

[88]  Marine Carpuat,et al.  Improving Statistical Machine Translation Using Word Sense Disambiguation , 2007, EMNLP.

[89]  Nathanael Chambers,et al.  Improving the Use of Pseudo-Words for Evaluating Selectional Preferences , 2010, ACL.

[90]  Eneko Agirre,et al.  Exploring Automatic Word Sense Disambiguation with Decision Lists and the Web , 2000, SAIC@COLING.

[91]  Adam Kilgarriff,et al.  The Senseval-3 English lexical sample task , 2004, SENSEVAL@ACL.

[92]  Francis Bond,et al.  A Survey of WordNet Annotated Corpora , 2014, GWC.

[93]  Adam Kilgarriff,et al.  Framework and Results for English SENSEVAL , 2000, Comput. Humanit..

[94]  Hwee Tou Ng,et al.  An Empirical Evaluation of Knowledge Sources and Learning Algorithms for Word Sense Disambiguation , 2002, EMNLP.

[95]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .