Lexical Semantic Relatedness and Its Application in Natural Language Processing

Lexical Semantic Relatedness and Its Application in Natural Language Processing Alexander Budanitsky Department of Computer Science University of Toronto August 1999 A great variety of Natural Language Processing tasks, from word sense disambiguation to text summarization to speech recognition, rely heavily on the ability to measure semantic relatedness or distance between words of a natural language. This report is a comprehensive study of recent computational methods of measuring lexical semantic relatedness. A survey of methods, as well as their applications, is presented, and the question of evaluation is addressed both theoretically and experimentally. Application to the speci c task of intelligent spelling checking is discussed in detail: the design of a prototype system for the detection and correction of malapropisms (words that are similar in spelling or sound to, but quite di erent in meaning from, intended words) is described, and results of experiments on using various measures as plug-ins are considered. Suggestions for research directions in the areas of measuring semantic relatedness and intelligent spelling checking are o ered.

[1]  Graeme Hirst,et al.  Lexical chains as representations of context for the detection and correction of malapropisms , 1995 .

[2]  Roy Rada,et al.  Development and application of a metric on semantic nets , 1989, IEEE Trans. Syst. Man Cybern..

[3]  Karen Kukich,et al.  Techniques for automatically correcting words in text , 1992, CSUR.

[4]  Wisconsin , 1955 .

[5]  Michael Sussna,et al.  Word sense disambiguation for free-text indexing using a massive semantic network , 1993, CIKM '93.

[6]  R. Kazman,et al.  Temporal Indexing Through Lexical Chaining , 1998 .

[7]  Regina Barzilay,et al.  Using Lexical Chains for Text Summarization , 1997 .

[8]  G. Miller,et al.  Contextual correlates of semantic similarity , 1991 .

[9]  Hideki Kozima,et al.  Similarity between Words Computed by Spreading Activation on an English Dictionary , 1993, EACL.

[10]  James Pustejovsky Proceedings of the 32nd annual meeting on Association for Computational Linguistics , 1994 .

[11]  Dan Roth,et al.  Applying Winnow to Context-Sensitive Spelling Correction , 1996, ICML.

[12]  TWO-WEEK Loan COpy,et al.  University of California , 1886, The American journal of dental science.

[13]  Michael John Sussna,et al.  Text retrieval using inference in semantic metanetworks , 1997 .

[14]  Takenobu Tokunaga,et al.  Integration of Hand-Crafted and Statistical Resources in Measuring Word Similarity , 1997 .

[15]  Geoffrey Leech,et al.  Studies in language and linguistics , 1985 .

[16]  Graeme Hirst,et al.  Lexical Cohesion Computed by Thesaural relations as an indicator of the structure of text , 1991, CL.

[17]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[18]  Dragomir R. Radev,et al.  Generating summaries of multiple news articles , 1995, SIGIR '95.

[19]  Marti A. Hearst Multi-Paragraph Segmentation Expository Text , 1994, ACL.

[20]  I. G. BONNER CLAPPISON Editor , 1960, The Electric Power Engineering Handbook - Five Volume Set.

[21]  Donna K. Harman,et al.  Overview of the Third Text REtrieval Conference (TREC-3) , 1995, TREC.

[22]  George A. Miller,et al.  A Semantic Concordance , 1993, HLT.

[23]  C. Osgood The nature and measurement of meaning. , 1952, Psychological bulletin.

[24]  Dekang Lin,et al.  Using Syntactic Dependency as Local Context to Resolve Word Sense Ambiguity , 1997, ACL.

[25]  Yves Schabes,et al.  Combining Trigram-based and Feature-based Methods for Context-Sensitive Spelling Correction , 1996, ACL.

[26]  Roy Rada,et al.  Ranking documents with a thesaurus , 1989, JASIS.

[27]  Rick Kazman,et al.  Accessing multimedia through concept clustering , 1997, CHI.

[28]  Graeme Hirst,et al.  Automatically generating hypertext by computing semantic similarity , 1997 .

[29]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.

[30]  Jian Jiang Lexical semantic similarity and its application to business catalog retrieval , 1998 .

[31]  C M Sterling,et al.  Spelling errors in context. , 1983, British journal of psychology.

[32]  Rick Kazman,et al.  Four Paradigms for Indexing Video Conferences , 1996, IEEE Multim..

[33]  Takenobu Tokunaga,et al.  Extending a thesaurus by classifying words , 1997 .

[34]  Allan Collins,et al.  A spreading-activation theory of semantic processing , 1975 .

[35]  Eneko Agirre,et al.  A Proposal for Word Sense Disambiguation using Conceptual Distance , 1995, ArXiv.

[36]  A. Macallum The University of Toronto , 1907, Nature.

[37]  Manabu Okumura,et al.  Word Sense Disambiguation and Text Segmentation Based on Lexical Cohesion , 1994, COLING.

[38]  Andrew R. Golding,et al.  A Bayesian Hybrid Method for Context-sensitive Spelling Correction , 1996, VLC@ACL.

[39]  Petr Sgall,et al.  Graeme Hirst. Semantic interpretation and the resolution of ambiguity , 1989 .

[40]  E. Tronci,et al.  1996 , 1997, Affair of the Heart.

[41]  Karen Spärck Jones What Might be in a Summary? , 1993, Information Retrieval.

[42]  Roger Mitton,et al.  Spelling checkers, spelling correctors and the misspellings of poor spellers , 1987, Inf. Process. Manag..

[43]  Robert Alfred Amsler The Structure of the Merriam-Webster Pocket Dictionary , 1980 .

[44]  M Pupier [About the thesaurus...]. , 1997, Soins. Formation, pedagogie, encadrement : avec la participation du CEEIEC.

[45]  Akira Ito,et al.  Context-Sensitive Measurement of Word Distance by Adaptive Scaling of a Semantic Space , 1996, ArXiv.

[46]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[47]  John B. Goodenough,et al.  Contextual correlates of synonymy , 1965, CACM.

[48]  W. Nelson Francis,et al.  FREQUENCY ANALYSIS OF ENGLISH USAGE: LEXICON AND GRAMMAR , 1983 .

[49]  Candace L. Sidner,et al.  Attention, Intentions, and the Structure of Discourse , 1986, CL.

[50]  Stephen J. Green Building hypertext links in newspaper articles using semantic similarity , 1997 .

[51]  Antonio Zamora,et al.  Collection and characterization of spelling errors in scientific and scholarly text , 1983, J. Am. Soc. Inf. Sci..

[52]  Rebecca J. Passonneau,et al.  Intention-Based Segmentation: Human Reliability and Correlation with Linguistic Cues , 1993, ACL.

[53]  Paul R. Cohen,et al.  Information retrieval by constrained spreading activation in semantic networks , 1987, Inf. Process. Manag..

[54]  Gerald Salton,et al.  Automatic text processing , 1988 .

[55]  Ian H. Witten,et al.  Managing Gigabytes: Compressing and Indexing Documents and Images , 1999 .

[56]  David St-Onge,et al.  Detecting and Correcting Malapropisms with Lexical Chains , 1995 .

[57]  R. Kazman,et al.  Dynamic Meeting Annotation and Indexing , 1995 .

[58]  Myoung-Ho Kim,et al.  Information Retrieval Based on Conceptual Distance in is-a Hierarchies , 1993, J. Documentation.

[59]  Eneko Agirre,et al.  Word Sense Disambiguation using Conceptual Density , 1996, COLING.

[60]  Michael A. West,et al.  A general service list of English words, with semantic frequencies and a supplementary word-list for the writing of popular science and technology , 1953 .