Novel Word-sense Identification

Automatic lexical acquisition has been an active area of research in computational linguistics for over two decades, but the automatic identification of new word-senses has received attention only very recently. Previous work on this topic has been limited by the availability of appropriate evaluation resources. In this paper we present the largest corpus-based dataset of diachronic sense differences to date, which we believe will encourage further work in this area. We then describe several extensions to a state-of-the-art topic modelling approach for identifying new word-senses. This adapted method shows superior performance on our dataset of two different corpus pairs to that of the original method for both: (a) types having taken on a novel sense over time; and (b) the token instances of such novel senses.

[1]  F. M.,et al.  The Concise Oxford Dictionary of Current English , 1929, Nature.

[2]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[3]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[4]  Patricia Layzell Ward,et al.  Oxford Reference Online , 2002 .

[5]  Judy Pearsall,et al.  The concise Oxford English dictionary , 2016 .

[6]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[7]  Adam Kilgarriff,et al.  The Sketch Engine , 2004 .

[8]  Diana McCarthy,et al.  Domain-Speci(cid:12)c Sense Distributions and Predominant Sense Acquisition , 2022 .

[9]  J. Ayto Movers and Shakers: A Chronology of Words that Shaped Our Age , 2006 .

[10]  Andrew McCallum,et al.  Topics over time: a non-Markov continuous-time model of topical trends , 2006, KDD '06.

[11]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[12]  南出 康世,et al.  Macmillan English dictionary : for advanced learners , 2007 .

[13]  Julie Weeds,et al.  Unsupervised Acquisition of Predominant Word Senses , 2007, CL.

[14]  Silvia Bernardini,et al.  Introducing and evaluating ukWaC , a very large web-derived corpus of English , 2008 .

[15]  Mirella Lapata,et al.  Bayesian Word Sense Induction , 2009, EACL.

[16]  A. Kilgarriff Simple Maths for Keywords , 2009 .

[17]  Eyal Sagi,et al.  Semantic Density Analysis: Comparing Word Meaning across Time and Phonetic Space , 2009 .

[18]  Yves Peirsman,et al.  The automatic identification of lexical variation between language varieties , 2010, Natural Language Engineering.

[19]  Suzanne Stevenson,et al.  Automatically Identifying Changes in the Semantic Orientation of Words , 2010, LREC.

[20]  David Bamman,et al.  Measuring historical word sense variation , 2011, JCDL '11.

[21]  Thomas Mayer,et al.  Towards Tracking Semantic Change by Visual Analytics , 2011, ACL.

[22]  Marco Baroni,et al.  A distributional similarity approach to the detection of semantic change in the Google Books Ngram corpus. , 2011, GEMS.

[23]  Graeme Hirst,et al.  Automatic identification of words with novel but infrequent senses , 2011, PACLIC.

[24]  Xuchen Yao,et al.  Nonparametric Bayesian Word Sense Induction , 2011, Graph-based Methods for Natural Language Processing.

[25]  Timothy Baldwin,et al.  Word Sense Induction for Novel Sense Detection , 2012, EACL.

[26]  Diana McCarthy,et al.  Domain Specific Corpora from the Web , 2012 .

[27]  Graeme Hirst,et al.  Do Web Corpora from Top-Level Domains Represent National Varieties of English ? , 2012 .

[28]  Timothy Baldwin,et al.  A lexicographic appraisal of an automatic approach for detecting new word-senses , 2013 .

[29]  Timothy Baldwin,et al.  unimelb: Topic Modelling-based Word Sense Induction , 2013, SemEval@NAACL-HLT.

[30]  Timothy Baldwin,et al.  unimelb: Topic Modelling-based Word Sense Induction for Web Snippet Clustering , 2013, SemEval@NAACL-HLT.

[31]  David Jurgens,et al.  SemEval-2013 Task 13: Word Sense Induction for Graded and Non-Graded Senses , 2013, SemEval@NAACL-HLT.

[32]  Roberto Navigli,et al.  SemEval-2013 Task 11: Word Sense Induction and Disambiguation within an End-User Application , 2013, SemEval@NAACL-HLT.

[33]  Rachel Rudinger,et al.  SenseSpotting: Never let your parallel data tie you to an old domain , 2013, ACL.

[34]  Roger Evans,et al.  Adam Kilgarriff , 2015, Computational Linguistics.