论文信息 - Incorporating Dictionary and Corpus Information into a Context Vector Measure of Semantic Relatednes

Incorporating Dictionary and Corpus Information into a Context Vector Measure of Semantic Relatednes

This is to certify that I have examined this copy of master's thesis by SIDDHARTH PATWARDHAN and have found that it is complete and satisfactory in all respects, and that any and all revisions required by the final examining committee have been made. Acknowledgments I would like to take this opportunity to thank a number of people, without whose support and encouragement this thesis would not have been possible. Firstly, I would like to thank my advisor Dr. Pedersen for being so thorough and patient, and for seeing me through this research till the end. I would like to thank my committee members, Dr. Gallian and Dr. Turner, for going over the thesis so carefully and for their insightful suggestions. I would also like to thank Bano, whose work we built upon and who was full of ideas throughout. I thank my fellow NLP group members – Saif, Bridget and Amruta – for their ideas and suggestions and my colleague Navdeep for proofreading and providing her thoughts on the thesis. I am grateful to Jason Rennie for providing a wonderful interface to WordNet and to Mona Diab for her feedback on the measures. I am also grateful to Diana Inkpen for her insights on the Vector measure. Abstract Humans are able to judge the relatedness of words (concepts) relatively easily, and are often in general agreement as to how related two words are. For example, few would disagree that " pencil " is more related to " paper " than it is to " boat ". Miller and Charles (1991) attribute this human perception of relatedness to the overlap of contextual representations of words in the human mind, and there is at least some understanding of how humans are able to perform this task. However, it remains an open question as to how to create automatic computational methods that assign relatedness values or scores to pairs of concepts. A number of measures of relatedness have been proposed, most of them relying on information taken from the lexical database WordNet, and possibly augmented with corpus based statistics. In this thesis we study a number of such measures, and offer various refinements to those proposed We then compare these measures along with three others in the context of a human relatedness study and in word sense disambiguation experiments. We find that the measures of Jiang and Conrath (1997) and Banerjee and Pedersen …

Siddharth Patwardhan | Siddharth Patwardhan

[1] Eneko Agirre,et al. Word Sense Disambiguation using Conceptual Density , 1996, COLING.

[2] Hinrich Schütze,et al. Automatic Word Sense Discrimination , 1998, Comput. Linguistics.

[3] George A. Miller,et al. A Semantic Concordance , 1993, HLT.

[4] W. Nelson Francis,et al. FREQUENCY ANALYSIS OF ENGLISH USAGE: LEXICON AND GRAMMAR , 1983 .

[5] Graeme Hirst,et al. Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures , 2004 .

[6] Yoshihiko Nitta,et al. Co-Occurrence Vectors From Corpora vs. Distance Vectors From Dictionaries , 1994, COLING.

[7] Ted Pedersen,et al. Using Measures of Semantic Relatedness for Word Sense Disambiguation , 2003, CICLing.

[8] Christiane Fellbaum,et al. Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[9] Ted Pedersen,et al. Extended Gloss Overlaps as a Measure of Semantic Relatedness , 2003, IJCAI.

[10] Michael E. Lesk,et al. Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone , 1986, SIGDOC '86.

[11] G. Miller,et al. Contextual correlates of semantic similarity , 1991 .