Joint Learning of Character and Word Embeddings

Most word embedding methods take a word as a basic unit and learn embeddings according to words' external contexts, ignoring the internal structures of words. However, in some languages such as Chinese, a word is usually composed of several characters and contains rich internal information. The semantic meaning of a word is also related to the meanings of its composing characters. Hence, we take Chinese for example, and present a character-enhanced word embedding model (CWE). In order to address the issues of character ambiguity and non-compositional words, we propose multiple prototype character embeddings and an effective word selection method. We evaluate the effectiveness of CWE on word relatedness computation and analogical reasoning. The results show that CWE outperforms other baseline methods which ignore internal character information. The codes and data can be accessed from https://github.com/Leonard-Xu/CWE.

[1]  Noah A. Smith,et al.  Proceedings of NIPS , 2010, NIPS 2010.

[2]  Geoffrey E. Hinton,et al.  A Scalable Hierarchical Distributed Language Model , 2008, NIPS.

[3]  Noah A. Smith,et al.  Proceedings of EMNLP , 2007 .

[4]  Andrew McCallum,et al.  Efficient Non-parametric Estimation of Multiple Embeddings per Word in Vector Space , 2014, EMNLP.

[5]  S. Crawford,et al.  Volume 1 , 2012, Journal of Diabetes Investigation.

[6]  Rémi Eyraud,et al.  Proceedings of CoNLL , 2006 .

[7]  Phil Blunsom,et al.  Compositional Morphology for Word Representations and Language Modelling , 2014, ICML.

[8]  Andrew Y. Ng,et al.  Parsing Natural Scenes and Natural Language with Recursive Neural Networks , 2011, ICML.

[9]  R. Rosenfeld Nature , 2009, Otolaryngology--head and neck surgery : official journal of American Academy of Otolaryngology-Head and Neck Surgery.

[10]  Katrin Kirchhoff,et al.  Factored Neural Language Models , 2006, NAACL.

[11]  I. C. Mogotsi,et al.  Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze: Introduction to information retrieval , 2010, Information Retrieval.

[12]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[13]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[14]  Zhiyuan Liu,et al.  Learning Entity and Relation Embeddings for Knowledge Graph Completion , 2015, AAAI.

[15]  Andrew Y. Ng,et al.  Parsing with Compositional Vector Grammars , 2013, ACL.

[16]  Shay B. Cohen,et al.  Proceedings of ACL , 2013 .

[17]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[18]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[19]  Christopher D. Manning,et al.  Better Word Representations with Recursive Neural Networks for Morphology , 2013, CoNLL.

[20]  Andrew Y. Ng,et al.  Improving Word Representations via Global Context and Multiple Word Prototypes , 2012, ACL.

[21]  Yoshua Bengio,et al.  Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[22]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[23]  Zhiyuan Liu,et al.  Phrase Type Sensitive Tensor Indexing Model for Semantic Composition , 2015, AAAI.

[24]  Zhiyuan Liu,et al.  A Unified Model for Word Sense Representation and Disambiguation , 2014, EMNLP.

[25]  Marco Marelli,et al.  Compositional-ly Derived Representations of Morphologically Complex Words in Distributional Semantics , 2013, ACL.