Learning chinese word embeddings from character structural information

Abstract Word embedding is a basic task in natural language processing area. Unlike English, Chinese subword units, such as characters, radicals, and components, contain rich semantic information which can be used to enhance word embeddings. However, existing methods neglect the semantic contribution of corresponding subword units to the word. In this work, we employ attention mechanism to capture the semantic structure of Chinese words and propose a novel framework, named Attention-based multi-Layer Word Embedding model(ALWE). We also design an asynchronous strategy for updating embedding and attention efficiently. Our model learns to share subword information between distinct words selectively and adaptively. Experimental results on the word similarity, word analogy, and text classification show that the proposed model outperforms all baselines, especially when words do not appear frequently. Qualitative analysis further demonstrates the superiority of ALWE.

[1]  Hung-yi Lee,et al.  Learning Chinese Word Representations From Glyphs Of Characters , 2017, EMNLP.

[2]  Jun Zhou,et al.  cw2vec: Learning Chinese Word Embeddings with Stroke n-gram Information , 2018, AAAI.

[3]  Huanhuan Chen,et al.  Improve Chinese Word Embeddings by Exploiting Internal Structure , 2016, NAACL.

[4]  Zhiyuan Liu,et al.  Joint Learning of Character and Word Embeddings , 2015, IJCAI.

[5]  Po Hu,et al.  Learning Continuous Word Embedding with Metadata for Question Retrieval in Community Question Answering , 2015, ACL.

[6]  Tomas Mikolov,et al.  Bag of Tricks for Efficient Text Classification , 2016, EACL.

[7]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[8]  Shikun Zhang,et al.  Attention Enhanced Chinese Word Embeddings , 2018, ICANN.

[9]  Tie-Yan Liu,et al.  Co-learning of Word Representations and Morpheme Representations , 2014, COLING.

[10]  Ramón Fernández Astudillo,et al.  Not All Contexts Are Created Equal: Better Word Representations with Variable Attention , 2015, EMNLP.

[11]  Frederick Liu,et al.  Learning Character-level Compositionality with Visual Features , 2017, ACL.

[12]  Xueqi Cheng,et al.  Inside Out: Two Jointly Predictive Models for Word Representations and Phrase Representations , 2016, AAAI.

[13]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[14]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[15]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[16]  Rui Li,et al.  Multi-Granularity Chinese Word Embedding , 2016, EMNLP.

[17]  Hao Xin,et al.  Joint Embeddings of Chinese Words, Characters, and Fine-grained Subcharacter Components , 2017, EMNLP.

[18]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[19]  Jerome L. Myers,et al.  Research design and statistical analysis, 2nd ed. , 2003 .

[20]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[21]  Kevin Gimpel,et al.  Charagram: Embedding Words and Sentences via Character n-grams , 2016, EMNLP.

[22]  Ian J. Goodfellow,et al.  NIPS 2016 Tutorial: Generative Adversarial Networks , 2016, ArXiv.

[23]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[24]  Jing Li,et al.  Joint Learning Embeddings for Chinese Words and their Components via Ladder Structured Networks , 2018, IJCAI.