Lexical chains as representations of context for the detection and correction of malapropisms

Natural language utterances are, in general, highlyambiguous, and a unique interpretationcan usuallybe determined only by taking into account the constraining influence of the context in which theutterance occurred. Much of the research in natural language understanding in the last twenty yearscan be thought of as attempts to characterize and represent context and then derive interpretationsthatfit best with that context. Typically, this research was heavy with AI, taking context to be nothing lessthan a complete conceptual understanding of the preceding utterances. This was reasonable, as suchan understanding of a text was often the main task anyway. However, there are many text-processingtasksthatrequireonlya partialunderstandingofthetext, andhencea ‘lighter’representationofcontextis sufficient. In this paper, we examine the idea oflexical chains as such a representation. We showhow they can be constructed by means of WordNet, and how they can be applied in one particularlinguistic task: the detection and correction of malapropisms.A malapropism is the confounding of an intended word with another word of similar sound orsimilar spelling that has a quite different and malapropos meaning, e.g., an ingenuous [for ingenious]machine forpeelingoranges. In thisexample, there isaone-letterdifference betweenthe malapropismand the correct word. Ignorance, or a simple typing mistake, might cause such errors. However, sinceingenuous is a correctly spelled word, traditional spelling checkers cannot detect this kind of mistake.In section 4, we will propose an algorithm for detecting and correcting malapropisms that is based onthe construction of lexical chains.

[1]  Michael Halliday,et al.  Cohesion in English , 1976 .

[2]  Donald A. Norman,et al.  Simulating a Skilled Typist: A Study of Skilled Cognitive-Motor Performance , 1982, Cogn. Sci..

[3]  H. Gross Errors in Linguistic Performance: Slips of the Tongue, Ear, Pen, and Hand , 1983 .

[4]  Candace L. Sidner,et al.  Attention, Intentions, and the Structure of Discourse , 1986, CL.

[5]  James L. Peterson,et al.  A note on undetected typing errors , 1986, CACM.

[6]  Roger Mitton,et al.  Spelling checkers, spelling correctors and the misspellings of poor spellers , 1987, Inf. Process. Manag..

[7]  Eric Atwell,et al.  Dealing with ill-formed English text , 1987 .

[8]  C. Chapelle The Computational Analysis of English—A Corpus‐Based Approach , 1988 .

[9]  Graeme Hirst,et al.  Lexical Cohesion Computed by Thesaural relations as an indicator of the structure of text , 1991, CL.

[10]  D. Biber The computational analysis of English: A corpus-based approach: Roger Garside, Geoffrey Leech and Godfrey Sampson, eds., London: Longman, 1987. xii + p.£12.95. , 1991 .

[11]  Graeme Hirst,et al.  Semantic Interpretation and the Resolution of Ambiguity , 1987, Studies in natural language processing.

[12]  Karen Kukich,et al.  Techniques for automatically correcting words in text , 1992, CSUR.

[13]  Manabu Okumura,et al.  Word Sense Disambiguation and Text Segmentation Based on Lexical Cohesion , 1994, COLING.

[14]  Okumura Manabu,et al.  Word Sense Disambiguation and Text Segmentation Based on Lexical Cohesion , 1994, COLING.

[15]  Stan Matwin,et al.  A WordNet-based Algorithm for Word Sense Disambiguation , 1995, IJCAI.