Semantic Similarity Measures for the Development of Thai Dialog System

Semantic similarity plays an important role in a number of applications including information extraction, information retrieval, document clustering and ontology learning. Most work has concentrated on English and other European languages. However, for the Thai language, there has been no research about word semantic similarity. This paper presents an experiment and benchmark data sets investigating the application of a WordNet-based machine measure to Thai similarity. Because there is no functioning Thai WordNet we also investigate the use of English WordNet with machine translation of Thai words.

[1]  清川 英男 Longman Dictionary of Contemporary Englishの最頻語と学習英和辞典の「基本語」「重要語」の比較(2) , 2006 .

[2]  Giuseppe Pirrò,et al.  A semantic similarity metric combining features and intrinsic information content , 2009, Data Knowl. Eng..

[3]  David McLean,et al.  An Approach for Measuring Semantic Similarity between Words Using Multiple Information Sources , 2003, IEEE Trans. Knowl. Data Eng..

[4]  Amir Najmi,et al.  An interactive dialog system for learning Japanese , 2000, Speech Commun..

[5]  Roy Rada,et al.  Development and application of a metric on semantic nets , 1989, IEEE Trans. Syst. Man Cybern..

[6]  Paul Procter,et al.  Longman Dictionary of Contemporary English , 1978 .

[7]  Zuhair Bandar,et al.  Sentence similarity based on semantic nets and corpus statistics , 2006, IEEE Transactions on Knowledge and Data Engineering.

[8]  Patrick F. Reidy An Introduction to Latent Semantic Analysis , 2009 .

[9]  Euripides G. M. Petrakis,et al.  Information Retrieval by Semantic Similarity , 2006, Int. J. Semantic Web Inf. Syst..

[10]  Franz Josef Och Statistical Machine Translation: Foundations and Recent Advances , 2005, MTSUMMIT.

[11]  Stefan Kopp,et al.  A Conversational Agent as Museum Guide - Design and Evaluation of a Real-World Application , 2005, IVA.

[12]  Max J. Egenhofer,et al.  Determining Semantic Similarity among Entity Classes from Different Ontologies , 2003, IEEE Trans. Knowl. Data Eng..

[13]  P. Lewis Ethnologue : languages of the world , 2009 .

[14]  Oliver Lemon,et al.  DUDE: A Dialogue and Understanding Development Environment, Mapping Business Process Models to Information State Update Dialogue Systems , 2006, EACL.

[15]  Ted Pedersen,et al.  Measures of semantic similarity and relatedness in the biomedical domain , 2007, J. Biomed. Informatics.

[16]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[17]  Zuhair Bandar,et al.  A Comparative Study of Two Short Text Semantic Similarity Measures , 2008, KES-AMSTA.

[18]  Stan Szpakowicz,et al.  Roget's thesaurus and semantic similarity , 2012, RANLP.

[19]  Peter W. Foltz,et al.  An introduction to latent semantic analysis , 1998 .

[20]  John B. Goodenough,et al.  Contextual correlates of synonymy , 1965, CACM.

[21]  Mingxing Xu,et al.  Language understanding component for Chinese dialogue system , 2000, INTERSPEECH.

[22]  Pontus Johansson,et al.  Multimodal dialogue systems for interactive TV applications , 2002, Proceedings. Fourth IEEE International Conference on Multimodal Interfaces.

[23]  G. Miller,et al.  Contextual correlates of semantic similarity , 1991 .

[24]  Hideki Kozima,et al.  Similarity between Words Computed by Spreading Activation on an English Dictionary , 1993, EACL.

[25]  Virach Sornlertlamvanich,et al.  Review on development of Asian WordNet (機械翻訳技術の向上) , 2009 .

[26]  Graeme Hirst,et al.  Lexical Cohesion Computed by Thesaural relations as an indicator of the structure of text , 1991, CL.

[27]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[28]  George A. Miller WordNet: A Lexical Database for English , 1992, HLT.

[29]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[30]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.

[31]  A. Tversky Features of Similarity , 1977 .