Topical Clustering of MRD Senses Based on Information Retrieval Techniques

This paper describes a heuristic approach capable of automatically clustering senses in a machine-readable dictionary (MRD). Including these clusters in the MRD-based lexical database offers several positive benefits for word sense disambiguation (WSD). First, the clusters can be used as a coarser sense division, so unnecessarily fine sense distinction can be avoided. The clustered entries in the MRD can also be used as materials for supervised training to develop a WSD system. Furthermore, if the algorithm is run on several MRDs, the clusters also provide a means of linking different senses across multiple MRDs to create an integrated lexical database. An implementation of the method for clustering definition sentences in the Longman Dictionary of Contemporary English (LDOCE) is described. To this end, the topical word lists and topical cross-references in the Longman Lexicon of Contemporary English (LLOCE) are used. Nearly half of the senses in the LDOCE can be linked precisely to a relevant LLOCE topic using a simple heuristic. With the definitions of senses linked to the same topic viewed as a document, topical clustering of the MRD senses bears a striking resemblance to retrieval of relevant documents for a given query in information retrieval (IR) research. Relatively well-established IR techniques of weighting terms and ranking document relevancy are applied to find the topical clusters that are most relevant to the definition of each MRD sense. Finally, we describe an implemented version of the algorithms for the LDOCE and the LLOCE and assess the performance of the proposed approach in a series of experiments and evaluations.

[1]  Donald E. Walker,et al.  Machine-readable dictionaries , 1984 .

[2]  David Yarowsky,et al.  Word-Sense Disambiguation Using Statistical Models of Roget’s Categories Trained on Large Corpora , 2010, COLING.

[3]  Janyce Wiebe,et al.  Word-Sense Disambiguation Using Decomposable Models , 1994, ACL.

[4]  W. Meijs,et al.  Meaning and structure in dictionary definitions , 1989 .

[5]  Louise Guthrie,et al.  Lexical Disambiguation using Simulated Annealing , 1992, COLING.

[6]  M. Felisa Verdejo,et al.  SEISD: An environment for extraction of Semantic Information from on-line dictionaries , 1992, ANLP.

[7]  Paul Procter,et al.  Longman Dictionary of Contemporary English , 1978 .

[8]  Yorick Wilks,et al.  Is there content in empty heads? , 1990, COLING.

[9]  Karen Jensen,et al.  Disambiguating Prepositional Phrase Attachments by Using On-Line Dictionary Definitions , 1987, Comput. Linguistics.

[10]  Uri Zernik,et al.  Lexical acquisition: Exploiting on-line resources to build a lexicon. , 1991 .

[11]  Brian M. Slator,et al.  Providing machine tractable dictionary tools , 1990 .

[12]  George W. Davidson,et al.  Roget's Thesaurus of English Words and Phrases , 1982 .

[13]  W. Bruce Croft,et al.  Lexical ambiguity and information retrieval , 1992, TOIS.

[14]  Ann Copestake,et al.  An Approach to Building the Hierarchical Element of a Lexical Knowledge Base From a Machine Readable , 1990 .

[15]  Simonetta Montemagni,et al.  Structural Patterns vs. String Patterns for Extracting Semantic Information from Dictionaries , 1992, COLING.

[16]  Robert Krovetz Sense-Linking in a Machine Readable Dictionary , 1992, ACL.

[17]  Martha W. Evens,et al.  Parsing vs. Text Processing in the Analysis of Dictionary Definitions , 1988, ACL.

[18]  Alon Itai,et al.  Word Sense Disambiguation Using a Second Language Monolingual Corpus , 1994, CL.

[19]  Antonio Sanfilippo,et al.  The Acquisition of Lexical Knowledge from Combined Machine-Readable Dictionary Sources , 1992, ANLP.

[20]  Kenneth Ward Church A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text , 1988, ANLP.

[21]  Martin Chodorow,et al.  Extracting Semantic Hierarchies from a Large On-Line Dictionary , 1985, ACL.

[22]  Kenneth Ward Church A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text , 1988, ANLP.

[23]  David Yarowsky,et al.  Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[24]  Robert A. Amsler Lexical Knowledge Bases , 1984, COLING.

[25]  David Carter,et al.  Placing the dictionary on-line , 1989 .

[26]  Ian H. Witten,et al.  Managing gigabytes , 1994 .

[27]  R. A. Amsler Machine-readable dictionaries , 1984 .

[28]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[29]  Hinrich Schfitze,et al.  Word Sense Disambiguation With Sublexical Representations , 1992 .

[30]  Alon Itai,et al.  Two Languages Are More Informative Than One , 1991, ACL.

[31]  Robert L. Mercer,et al.  Word-Sense Disambiguation Using Statistical Methods , 1991, ACL.

[32]  Hiyan Alshawi,et al.  Processing Dictionary Definitions with Phrasal Pattern Hierarchies , 1987, CL.

[33]  Sur-Jin Ker,et al.  Combining machine readable lexical resources and bilingual corpora for broad word sense disambiguation , 1996, AMTA.

[34]  James Pustejovsky,et al.  The Generative Lexicon , 1995, CL.

[35]  Tom McArthur,et al.  Longman Lexicon of Contemporary English , 1981 .

[36]  Yael Ravin,et al.  Disamibiguating and Interpreting Verb Definitions , 1990, ACL.

[37]  Hwee Tou Ng,et al.  Integrating Multiple Knowledge Sources to Disambiguate Word Sense: An Exemplar-Based Approach , 1996, ACL.

[38]  Robert A. Amsler,et al.  Words and Worlds , 1987, TINLAP.

[39]  Michael E. Lesk,et al.  Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone , 1986, SIGDOC '86.

[40]  B. T. S. Atkins,et al.  Predictable Meaning Shift: Some Linguistic Properties of Lexical Implication Rules , 1991, SIGLEX Workshop.

[41]  Lucy Vanderwende,et al.  Algorithm for Automatic Interpretation of Noun Sequences , 1994, COLING.

[42]  Alpha K. Luk Statistical Sense Disambiguation with Relatively Small Corpora Using Dictionary Definitions , 1995, ACL.

[43]  Susan McRoy,et al.  Using Multiple Knowledge Sources for Word Sense Discrimination , 1992, Comput. Linguistics.

[44]  William B. Dolan,et al.  Word Sense Ambiguation: Clustering Related Senses , 1994, COLING.

[45]  James Pustejovsky,et al.  On the Proper Role of Coercion in Semantic Typing , 1994, COLING.

[46]  Eduard Hovy,et al.  Lexicon-to-Ontology Concept Association Using a Bilingual Dictionary , 1994, AMTA.