Unsupervised Does Not Mean Uninterpretable: The Case for Word Sense Induction and Disambiguation

The current trend in NLP is the use of highly opaque models, e.g. neural networks and word embeddings. While these models yield state-of-the-art results on a range of tasks, their drawback is poor interpretability. On the example of word sense induction and disambiguation (WSID), we show that it is possible to develop an interpretable model that matches the state-of-the-art models in accuracy. Namely, we present an unsupervised, knowledge-free WSID approach, which is interpretable at three levels: word sense inventory, sense feature representations, and disambiguation procedure. Experiments show that our model performs on par with state-of-the-art word sense embeddings and other unsupervised systems while offering the possibility to justify its decisions in human-readable form.

[1]  Hwee Tou Ng,et al.  An Empirical Evaluation of Knowledge Sources and Learning Algorithms for Word Sense Disambiguation , 2002, EMNLP.

[2]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[3]  Stefan Evert,et al.  The Statistics of Word Cooccur-rences: Word Pairs and Collocations , 2004 .

[4]  Nancy Ide,et al.  Introduction to the Special Issue on Word Sense Disambiguation: The State of the Art , 1998, Comput. Linguistics.

[5]  Roberto Navigli,et al.  NASARI: a Novel Approach to a Semantically-Aware Representation of Items , 2015, NAACL.

[6]  Silvia Bernardini,et al.  The WaCky wide web: a collection of very large linguistically processed web-crawled corpora , 2009, Lang. Resour. Evaluation.

[7]  Tonio Wandmacher,et al.  Automatic Acquisition of the , 2009, EMNLP.

[8]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[9]  Eneko Agirre,et al.  Word Sense Disambiguation: Algorithms and Applications , 2007 .

[10]  T. Van de Cruys,et al.  Mining for meaning: the extraction of lexico-semantic knowledge from text , 2010 .

[11]  Timothy Baldwin,et al.  unimelb: Topic Modelling-based Word Sense Induction , 2013, SemEval@NAACL-HLT.

[12]  Heng Low Wee Word Sense Prediction Using Decision Trees , 2010 .

[13]  Paulo J. G. Lisboa,et al.  Making machine learning models interpretable , 2012, ESANN.

[14]  Roberto Navigli,et al.  Entity Linking meets Word Sense Disambiguation: a Unified Approach , 2014, TACL.

[15]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[16]  Martin Riedl,et al.  Unsupervised Methods for Learning and Using Semantics of Natural Language , 2016 .

[17]  Daniel Jurafsky,et al.  Do Multi-Sense Embeddings Improve Natural Language Understanding? , 2015, EMNLP.

[18]  Jean Véronis,et al.  HyperLex: lexical cartography for information retrieval , 2004, Comput. Speech Lang..

[19]  Richard Johansson,et al.  Embedding Senses for Efficient Graph-based Word Sense Disambiguation , 2016, TextGraphs@NAACL-HLT.

[20]  Iryna Gurevych,et al.  Using Distributional Similarity for Lexical Expansion in Knowledge-based Word Sense Disambiguation , 2012, COLING.

[21]  Xinlei Chen,et al.  Visualizing and Understanding Neural Models in NLP , 2015, NAACL.

[22]  Christian Biemann,et al.  Making Sense of Word Embeddings , 2016, Rep4NLP@ACL.

[23]  Enhong Chen,et al.  A Probabilistic Model for Learning Multi-Prototype Word Embeddings , 2014, COLING.

[24]  Ted Dunning,et al.  Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.

[25]  Hinrich Schütze,et al.  AutoExtend: Extending Word Embeddings to Embeddings for Synsets and Lexemes , 2015, ACL.

[26]  Christian Biemann,et al.  Text: now in 2D! A framework for lexical expansion with contextual similarity , 2013, J. Lang. Model..

[27]  Mirella Lapata,et al.  Dependency-Based Construction of Semantic Space Models , 2007, CL.

[28]  Christian Biemann,et al.  Turk Bootstrap Word Sense Inventory 2.0: A Large-Scale Resource for Lexical Substitution , 2012, LREC.

[29]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[30]  Martin Everett,et al.  Ego network betweenness , 2005, Soc. Networks.

[31]  Zhiyuan Liu,et al.  A Unified Model for Word Sense Representation and Disambiguation , 2014, EMNLP.

[32]  Paulo J. G. Lisboa,et al.  Seeing is believing: The importance of visualization in real-world machine learning applications , 2011, ESANN.

[33]  Bill Keller,et al.  MaxMax: A Graph-Based Soft Clustering Algorithm Applied to Word Sense Induction , 2013, CICLing.

[34]  Ted Pedersen,et al.  Distinguishing Word Senses in Untagged Text , 1997, EMNLP.

[35]  Deniz Yuret,et al.  FASTSUBS: An Efficient and Exact Procedure for Finding the Most Likely Lexical Substitutes Based on an N-Gram Language Model , 2012, IEEE Signal Processing Letters.

[36]  Hwee Tou Ng,et al.  It Makes Sense: A Wide-Coverage Word Sense Disambiguation System for Free Text , 2010, ACL.

[37]  Stefano Faralli,et al.  A New Minimally-Supervised Framework for Domain Word Sense Disambiguation , 2012, EMNLP.

[38]  Omer Levy,et al.  Dependency-Based Word Embeddings , 2014, ACL.

[39]  Roberto Navigli,et al.  Clustering and Diversifying Web Search Results with Graph-Based Word Sense Induction , 2013, CL.

[40]  Andrew Y. Ng,et al.  Improving Word Representations via Global Context and Multiple Word Prototypes , 2012, ACL.

[41]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[42]  Alex Alves Freitas,et al.  Comprehensible classification models: a position paper , 2014, SKDD.

[43]  Christopher D. Manning,et al.  Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.

[44]  Stijn van Dongen,et al.  Graph Clustering Via a Discrete Uncoupling Process , 2008, SIAM J. Matrix Anal. Appl..

[45]  Anton Osokin,et al.  Breaking Sticks and Ambiguities with Adaptive Skip-gram , 2015, AISTATS.

[46]  Roberto Navigli,et al.  SemEval-2015 Task 13: Multilingual All-Words Sense Disambiguation and Entity Linking , 2015, *SEMEVAL.

[47]  Ignacio Iacobacci,et al.  SensEmbed: Learning Sense Embeddings for Word and Relational Similarity , 2015, ACL.

[48]  Enis Sert,et al.  AI-KU: Using Substitute Vectors and Co-Occurrence Modeling For Word Sense Induction and Disambiguation , 2013, SemEval@NAACL-HLT.

[49]  Christian Biemann,et al.  Chinese Whispers - an Efficient Graph Clustering Algorithm and its Application to Natural Language Processing Problems , 2006 .

[50]  Micha Elsner,et al.  Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) , 2014 .

[51]  Ted Pedersen,et al.  Maximizing Semantic Relatedness to Perform Word Sense Disambiguation , 2005 .

[52]  Alexander Panchenko Best of Both Worlds: Making Word Sense Embeddings Interpretable , 2016, LREC.

[53]  Alexander Panchenko,et al.  A Study of Hybrid Similarity Measures for Semantic Relation Extraction , 2012 .

[54]  Christian Biemann,et al.  JoBimViz: A Web-based Visualization for Graph-based Distributional Semantic Models , 2015, ACL.

[55]  Simone Paolo Ponzetto,et al.  BabelNet: Building a Very Large Multilingual Semantic Network , 2010, ACL.

[56]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[57]  David Jurgens,et al.  SemEval-2013 Task 13: Word Sense Induction for Graded and Non-Graded Senses , 2013, SemEval@NAACL-HLT.

[58]  Hinrich Schütze,et al.  Automatic Word Sense Discrimination , 1998, Comput. Linguistics.

[59]  Dominic Widdows,et al.  A Graph Model for Unsupervised Lexical Acquisition , 2002, COLING.

[60]  Ted Pedersen,et al.  An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet , 2002, CICLing.

[61]  Dan Klein,et al.  Combining Heterogeneous Classifiers for Word Sense Disambiguation , 2001, SENSEVAL@ACL.

[62]  Dominic Widdows,et al.  Using Curvature and Markov Clustering in Graphs for Lexical Acquisition and Word Sense Discrimination , 2004 .

[63]  Andrew McCallum,et al.  Efficient Non-parametric Estimation of Multiple Embeddings per Word in Vector Space , 2014, EMNLP.

[64]  Michael E. Lesk,et al.  Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone , 1986, SIGDOC '86.

[65]  Trevor Darrell,et al.  Attentive Explanations: Justifying Decisions and Pointing to the Evidence , 2016, ArXiv.

[66]  Patrick Pantel,et al.  Discovering word senses from text , 2002, KDD.

[67]  Roberto Navigli,et al.  Word sense disambiguation: A survey , 2009, CSUR.

[68]  Hwee Tou Ng,et al.  Exemplar-Based Word Sense Disambiguation” Some Recent Improvements , 1997, EMNLP.