On the Generality of Thesaurally derived Lexical Links

Cohesion is that property of a text that allows it to be read as a unified entity rather than a series of unconnected sentences. Lexical cohesion may be detected using an external thesaurus and the resulting representation used in a variety of language processing tasks. Our particular interest is in determining whether texts of different genres are similar in meaning. For this, we wish to derive a measure based on lexical cohesion. Consequently, we need to determine if lexical cohesion is independent of genre or a function of it. This paper examines the statistics of lexical cohesive relations. Our method involves determining the distribution of lexically cohesive relations in several book length texts. These a re shown to have different reading complexities, but equivalent cohesive properties. From this, we conclude that lexical cohesion is independent of reading complexity.

[1]  Michael E. Lesk,et al.  Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone , 1986, SIGDOC '86.

[2]  Tomek Strzalkowski,et al.  Natural Language Information Retrieval: TIPSTER-2 Final Report , 1996, TIPSTER.

[3]  Alan F. Smeaton,et al.  Using NLP or NLP Resources for Information Retrieval Tasks , 1999 .

[4]  Graeme Hirst,et al.  Lexical Cohesion Computed by Thesaural relations as an indicator of the structure of text , 1991, CL.

[5]  Jeremy Ellman,et al.  Using the Generic Document Profile to Cluster Similar Texts. , 1998 .

[6]  David Yarowsky,et al.  Word-Sense Disambiguation Using Statistical Models of Roget’s Categories Trained on Large Corpora , 2010, COLING.

[7]  Mario Lenz,et al.  Textual CBR and Information Retrieval -- A Comparison , 1998 .

[8]  John Lafferty,et al.  A Model of Lexical Attraction and Repulsion , 1997, Annual Meeting of the Association for Computational Linguistics.

[9]  Graeme Hirst,et al.  Automatically generating hypertext by computing semantic similarity , 1997 .

[10]  Frans Coenen,et al.  Research and Development in Intelligent Systems XVI , 2000, Springer London.

[11]  Colin Harrison,et al.  Readability in the Classroom , 1980 .

[12]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[13]  Michael Halliday,et al.  Cohesion in English , 1976 .

[14]  John Tait,et al.  Roget’s Thesaurus: An additional knowledge source for Textual CBR? , 2000 .

[15]  Mark Sanderson,et al.  Word sense disambiguation and information retrieval , 1994, SIGIR '94.

[16]  David St-Onge,et al.  Detecting and Correcting Malapropisms with Lexical Chains , 1995 .

[17]  Jussi Karlgren,et al.  Recognizing Text Genres With Simple Metrics Using Discriminant Analysis , 1994, COLING.

[18]  Okumura Manabu,et al.  Word Sense Disambiguation and Text Segmentation Based on Lexical Cohesion , 1994, COLING.

[19]  M. Halliday,et al.  Language, Context, and Text: Aspects of Language in a Social-Semiotic Perspective , 1989 .