论文信息 - Textual Similarity: Comparing texts in order to discover how closely they discuss the same topics

Textual Similarity: Comparing texts in order to discover how closely they discuss the same topics

This thesis describes the design and implementation of a tool for measuring textual similarity. The thesis looks into different aspects of text processing and graph searching in an attempt to define similarity. Furthermore, a solution for measuring textual similarity is proposed and implemented. Challenges such as disambiguation of word senses, part-of-speech tagging and several graph searching algorithms are described and used in the measurements. The developed tool is tested using human evaluation of textual similarity and it is concluded that the tool to some degree is able to measure textual similarity with the same results as a human being.

Niklas Skamriis Boss | Andreas Schmidt Jensen

[1] Graeme Hirst,et al. Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures , 2004 .

[2] Vladimir I. Levenshtein,et al. Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[3] Ted Pedersen,et al. An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet , 2002, CICLing.

[4] Paul M. B. Vitányi,et al. The Google Similarity Distance , 2004, IEEE Transactions on Knowledge and Data Engineering.

[5] Michael E. Lesk,et al. Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone , 1986, SIGDOC '86.

[6] Martha Palmer,et al. Verb Semantics and Lexical Selection , 1994, ACL.

[7] Philip Resnik,et al. Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language , 1999, J. Artif. Intell. Res..

[8] Ronald L. Rivest,et al. Introduction to Algorithms, Second Edition , 2001 .

[9] Fakhreddine O. Karray,et al. Soft Computing and Intelligent Systems Design, Theory, Tools and Applications , 2006, IEEE Transactions on Neural Networks.

[10] George A. Miller,et al. WordNet: A Lexical Database for English , 1995, HLT.