Attention-Based Chinese Word Embedding

Recent studies have shown that the internal composition of the Chinese word provides rich semantic information for Chinese word representation. The Chinese word consists of one or more Chinese characters. Chinese characters have semantic information. And some Chinese characters have multiple meanings. Moreover, the composition of Chinese characters has different semantic contributions to word. In response to this phenomenon, this paper proposes a new attention-based model (ACWE) to learn Chinese word representation. At the same time, the “HIT IR-Lab Tongyici Cilin (Extended Version)” can calculate the semantic similarity between Chinese characters and words. And it can reduce the impact of data sparseness and improve the effectiveness of Chinese word representation. We evaluate the ACWE model from the similarity task and the analogical reasoning task, and the experimental results show that the ACWE model is superior to the existing baseline model.

[1]  Hao Xin,et al.  Joint Embeddings of Chinese Words, Characters, and Fine-grained Subcharacter Components , 2017, EMNLP.

[2]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[3]  Phil Blunsom,et al.  Multilingual Models for Compositional Distributed Semantics , 2014, ACL.

[4]  Peng Jin,et al.  Integrating Character Representations into Chinese Word Embedding , 2016, CLSW.

[5]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[6]  Peng Jin,et al.  SemEval-2012 Task 4: Evaluating Chinese Word Similarity , 2012, SemEval@NAACL-HLT.

[7]  Jerome L. Myers,et al.  Research Design and Statistical Analysis: Third Edition , 1991 .

[8]  Kris Cao,et al.  A Joint Model for Word Embedding and Word Morphology , 2016, Rep4NLP@ACL.

[9]  Wenjie Li,et al.  Component-Enhanced Chinese Character Embeddings , 2015, EMNLP.

[10]  Zhiyuan Liu,et al.  Joint Learning of Character and Word Embeddings , 2015, IJCAI.

[11]  Jun Zhou,et al.  cw2vec: Learning Chinese Word Embeddings with Stroke n-gram Information , 2018, AAAI.

[12]  W. Bruce Croft,et al.  Relevance-based Word Embedding , 2017, SIGIR.

[13]  Zhongguo Li Parsing the Internal Structure of Words: A New Paradigm for Chinese Word Segmentation , 2011, ACL.

[14]  Phil Blunsom,et al.  Compositional Morphology for Word Representations and Language Modelling , 2014, ICML.

[15]  Jun Zhao,et al.  How to Generate a Good Word Embedding , 2015, IEEE Intelligent Systems.

[16]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[17]  Hung-yi Lee,et al.  Learning Chinese Word Representations From Glyphs Of Characters , 2017, EMNLP.

[18]  Huanhuan Chen,et al.  Improve Chinese Word Embeddings by Exploiting Internal Structure , 2016, NAACL.