论文信息 - A Study on Word Vector Models for Representing Korean Semantic Information

A Study on Word Vector Models for Representing Korean Semantic Information

This paper examines whether the Global Vector model is applicable to Korean data as a universal learning algorithm. The main purpose of this study is to compare the global vector model (GloVe) with the word2vec models such as a continuous bag-of-words (CBOW) model and a skip-gram (SG) model. For this purpose, we conducted an experiment by employing an evaluation corpus consisting of 70 target words and 819 pairs of Korean words for word similarities and analogies, respectively. Results of the word similarity task indicated that the Pearson correlation coefficients of 0.3133 as compared with the human judgement in GloVe, 0.2637 in CBOW and 0.2177 in SG. The word analogy task showed that the overall accuracy rate of 67% in semantic and syntactic relations was obtained in GloVe, 66% in CBOW and 57% in SG.

Myoung-Wan Koo | Hyun Jung Lee | Hejung Yang | Young-In Lee | Sook Whan Cho

[1] Bin Ma,et al. A Vector Space Modeling Approach to Spoken Language Identification , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[2] Lee,et al. Korean Semantic Similarity Measures for the Vector Space Models , 2015 .

[3] Yoshua Bengio,et al. Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[4] Patrick Pantel,et al. From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[5] Christopher Potts,et al. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[6] T. Landauer,et al. Indexing by Latent Semantic Analysis , 1990 .

[7] Stephen Clark,et al. Vector Space Models of Lexical Meaning , 2015 .

[8] Fabrizio Sebastiani,et al. Machine learning in automated text categorization , 2001, CSUR.

[9] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[10] Hinrich Schütze,et al. Introduction to information retrieval , 2008 .

[11] Richard A. Harshman,et al. Indexing by latent semantic indexing analysis , 1990 .

[12] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[13] Katrin Erk,et al. Vector Space Models of Word Meaning and Phrase Meaning: A Survey , 2012, Lang. Linguistics Compass.

[14] Jimmy J. Lin,et al. Quantitative evaluation of passage retrieval algorithms for question answering , 2003, SIGIR.