Word Sense Disambiguation of Thai Language with Unsupervised Learning

Many approach strategies can be employed to resolve word sense ambiguity with a reasonable degree of accuracy. These strategies are: knowledge-based, corpus-based, and hybrid-based. This paper pays attention to the corpus-based strategy that employs an unsupervised learning method for disambiguation. We report our investigation of Latent Semantic Indexing (LSI), an unsupervised learning, to the task of Thai noun and verbal word sense disambiguation. We report experiments on two Thai polysemous words, namely Unknown XML node MediaObject /hua4/ and Unknown XML node MediaObject /kep1/ that are used as a representative of Thai nouns and verbs respectively. The results of these experiments demonstrate the effectiveness and indicate the potential of applying vector-based distributional information measures to semantic disambiguation. Our approach performs better than a baseline system, which picks the most frequent sense.

[1]  David G. Stork,et al.  Pattern Classification , 1973 .

[2]  Susan T. Dumais,et al.  Latent Semantic Indexing (LSI) and TREC-2 , 1993, TREC.

[3]  George A. Miller,et al.  Using Corpus Statistics and WordNet Relations for Sense Identification , 1998, CL.

[4]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[5]  I. Jolliffe Principal Component Analysis , 2002 .

[6]  Uri Zernik,et al.  Lexical acquisition: Exploiting on-line resources to build a lexicon. , 1991 .

[7]  George A. Miller,et al.  Using a Semantic Concordance for Sense Identification , 1994, HLT.

[8]  David Yarowsky,et al.  Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[9]  Peter W. Foltz,et al.  Latent semantic analysis for text-based research , 1996 .

[10]  Michael W. Berry,et al.  Large-Scale Sparse Singular Value Computations , 1992 .

[11]  Amruta Purandare Discriminating Among Word Senses Using McQuitty's Similarity Analysis , 2003, HLT-NAACL.

[12]  H. Schütze,et al.  Dimensions of meaning , 1992, Supercomputing '92.

[13]  Susan T. Dumais,et al.  Using Linear Algebra for Intelligent Information Retrieval , 1995, SIAM Rev..

[14]  Wipharuk Kanokrattananukul,et al.  Word sense disambiguation in Thai using decision list collocation , 2001 .

[15]  Dinh Van Huynh,et al.  Algebra and Its Applications , 2006 .

[16]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[17]  Eneko Agirre,et al.  A Proposal for Word Sense Disambiguation using Conceptual Distance , 1995, ArXiv.

[18]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[19]  Hwee Tou Ng,et al.  Integrating Multiple Knowledge Sources to Disambiguate Word Sense: An Exemplar-Based Approach , 1996, ACL.

[20]  Ted Pedersen,et al.  Distinguishing Word Senses in Untagged Text , 1997, EMNLP.

[21]  Ted Pedersen,et al.  Word Sense Discrimination by Clustering Contexts in Vector and Similarity Spaces , 2004, CoNLL.