Word Sense Disambiguation and Text Segmentation Based on Lexical Cohesion

In this paper, we describe ihow word sense am= biguity can be resolw'.d with the aid of lexical eohesion. By checking ]exical coheshm between the current word and lexical chains in the order of the salience, in tandem with getmration of lexica] chains~ we realize incretnental word sense d i s a m biguation based on contextual infl)rmation that lexical chains,reveah Next;, we <le~<:ribe how set men< boundaries of a text can be determined with the aid of lexical cohesion. Wc can measure the plausibility of each point in the text as a segment boundary by computing a degree of agreement of the start and end points of lexical chaihs. 1 I n t r o d u c t i o n A text is not a mere set of unrelated sentences. Rather, sentences in a text are about the same thing and connected to each other[l()]. Cohesion and cohere'nee are said to contribute to such connection of the sentences. While coherence is a semantic relationship and needs computat ionally expensive processing for identification, cohesion is a surface relationship among words iu a text and more accessible than coherence. Cohesion is roughly classitled into reference t, co'r@tnction, and lezical coh, esion 2. Except conjmwtion that explicitly indicates l;he relationship between sentences, l;he other two <:lasses are considered to t>e similar in that the relationship hetweer~ sentences is in<licated by two semantically same(or related) words. But lexical 1Reference by pronouns and ellipsis in Halliday and Hasan's classification[3] are included here. 2Reference by flfll NPs, substitution mtd lcxical cohe-. sion in Ilalllday and Hasan's classillcation a.re included here. cohesion is far easier to idenlAfy than reference bec a u s e 1)oth words in lexical cohesion relation app e a r in a text while one word in reference relation is a pr<mom, or elided and has less information to infer the other word in the relation automatically. Based on this observation, we use lexical cohesion as a linguistic device for discourse analysis. We call a sequence of words which are in lexieal cohesion relation with each other a Icxical chain like [10]. l,exical chains tend to indicate portions of a text; that form a semantic uttit. And so vari.ous lexical chains tend to appear in a text corre. spou(ling to the change of the topic. Therefore, I. lexical chains provide a local context to aid in the resolution of word sense ambiguity; 2. lexical <'hains provide a <'lue for the determination of segnlent boundaries of the text[10]. ]n this paper, we first describe how word sense ambiguity can t)e resolved with the aid of lexical cohesion. During the process of generating lexi<'al chains incrementally, they are recorded in a register in the order of the salience. The salie'ncc of lexical chains is based on their recency and length. Since the more salient lexical chain r e p resents the nearby local context, by checking lexi: ca[ cohesion between the current word and lexieal chains in the order of tile salience, in tandem with generatiou of lexical chains, we realize incremen. tal word sense disambiguation based on contextual information that lexical chains reveal. Next;, we describe how segment boundaries o f a text can be determined with the aid of lexical cohesion. Since the start and end points of lexical chains it, the text tend to indicate the start and end points of the segment, we can measure the plausibility o[' each point in the text as a segment boundary by computing a degree of agreement of the sta.rt and end points of lexical chains.