A Method for Measuring Sentence Similarity and iIts Application to Conversational Agents

This paper presents a novel algorithm for computing similarity between very short texts of sentence length. It will introduce a method that takes account of not only semantic information but also word order information implied in the sentences. Firstly, semantic similarity between two sentences is derived from information from a structured lexical database and from corpus statistics. Secondly, word order similarity is computed from the position of word appearance in the sentence. Finally, sentence similarity is computed as a combination of semantic similarity and word order similarity. The proposed algorithm is applied to a real world domain of conversational agents. Experimental results demonstrated that the proposed algorithm reduces the scripter's effort to devise rule base for conversational agent.

[1]  Peter Wiemer-Hastings,et al.  Adding syntactic information to LSA , 2000 .

[2]  Charles T. Meadow,et al.  Text information retrieval systems , 1992 .

[3]  David McLean,et al.  An Approach for Measuring Semantic Similarity between Words Using Multiple Information Sources , 2003, IEEE Trans. Knowl. Data Eng..

[4]  Peter W. Foltz,et al.  The Measurement of Textual Coherence with Latent Semantic Analysis. , 1998 .

[5]  Eleazar Eskin,et al.  Detecting Text Similarity over Short Passages: Exploring Linguistic Feature Combinations via Machine Learning , 1999, EMNLP.

[6]  Bob Rehder,et al.  How Well Can Passage Meaning be Derived without Using Word Order? A Comparison of Latent Semantic Analysis and Humans , 1997 .

[7]  Andrew Radford,et al.  Linguistics: An Introduction , 1999 .

[8]  Donald Michie,et al.  Return of the Imitation Game , 2001, Electron. Trans. Artif. Intell..

[9]  Marcia J. Bates,et al.  Subject access in online catalogs: A design model , 1986 .

[10]  小嶋 秀樹,et al.  Computing lexical cohesion as a tool for text analysis , 1994 .

[11]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[12]  James F. Allen Natural language understanding , 1987, Bejnamin/Cummings series in computer science.

[13]  Michael Mc Hale,et al.  A Comparison of WordNet and Roget’s Taxonomy for Measuring Semantic Similarity , 1998, WordNet@ACL/COLING.

[14]  Curt Burgess,et al.  Explorations in context space: Words, sentences, discourse , 1998 .

[15]  James H. Martin,et al.  Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[16]  George A. Miller WordNet: A Lexical Database for English , 1992, HLT.