Radical-Enhanced Chinese Character Embedding

In this paper, we present a method to leverage radical for learning Chinese character embedding. Radical is a semantic and phonetic component of Chinese character. It plays an important role for modelling character semantics as characters with the same radical usually have similar semantic meaning and grammatical usage. However, most existing character (or word) embedding learning algorithms typically only model the syntactic contexts but ignore the radical information. As a result, they do not explicitly capture the inner semantic connections of characters via radical into the embedding space of characters. To solve this problem, we propose to incorporate the radical information for enhancing the Chinese character embedding. We present a dedicated neural architecture with a hybrid loss function, and integrate the radical information through softmax upon each character. To verify the effectiveness of the learned character embedding, we apply it on Chinese word segmentation. Experiment results on two benchmark datasets show that, our radical-enhanced method outperforms two widely-used context-based embedding learning algorithms.

[1]  Jason Weston,et al.  Learning Structured Embeddings of Knowledge Bases , 2011, AAAI.

[2]  Andrew Y. Ng,et al.  Improving Word Representations via Global Context and Multiple Word Prototypes , 2012, ACL.

[3]  Yoshua Bengio,et al.  Deep Learning of Representations: Looking Forward , 2013, SLSP.

[4]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[5]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[6]  Yoshua Bengio,et al.  Hierarchical Probabilistic Neural Network Language Model , 2005, AISTATS.

[7]  Yoshua Bengio,et al.  Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[8]  Ting Liu,et al.  Exploiting Multiple Sources for Open-Domain Hypernym Discovery , 2013, EMNLP.

[9]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[10]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[11]  Jeffrey Pennington,et al.  Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection , 2011, NIPS.

[12]  Ming Zhou,et al.  Learning Sentiment-Specific Word Embedding for Twitter Sentiment Classification , 2014, ACL.

[13]  Ronan Collobert,et al.  Is Deep Learning Really Necessary for Word Embeddings , 2013 .

[14]  Stephen Clark,et al.  A Fast Decoder for Joint Word Segmentation and POS-Tagging Using a Single Discriminative Model , 2010, EMNLP.

[15]  Andrew Y. Ng,et al.  Parsing with Compositional Vector Grammars , 2013, ACL.

[16]  Jr. G. Forney,et al.  Viterbi Algorithm , 1973, Encyclopedia of Machine Learning.

[17]  Peng Jin,et al.  Improved Word Similarity Computation for Chinese Using Sub-word Information , 2011, 2011 Seventh International Conference on Computational Intelligence and Security.

[18]  Geoffrey E. Hinton,et al.  A Scalable Hierarchical Distributed Language Model , 2008, NIPS.

[19]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[20]  Nianwen Xue,et al.  Chinese Word Segmentation as Character Tagging , 2003, ROCLING/IJCLCLP.

[21]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[22]  Qun Liu,et al.  Word Lattice Reranking for Chinese Word Segmentation and Part-of-Speech Tagging , 2008, COLING.

[23]  Peng Jin,et al.  Distributional Similarity for Chinese: Exploiting Characters and Radicals , 2012 .

[24]  Baobao Chang,et al.  Feature-based Neural Language Model and Chinese Word Segmentation , 2013, IJCNLP.

[25]  W. Bruce Croft,et al.  Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , 2013 .

[26]  Qun Liu,et al.  A Cascaded Linear Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging , 2008, ACL.

[27]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[28]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[29]  Lukás Burget,et al.  Extensions of recurrent neural network language model , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[30]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[31]  Jr. G. Forney,et al.  The viterbi algorithm , 1973 .

[32]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[33]  Yue Zhang,et al.  Chinese Parsing Exploiting Characters , 2013, ACL.

[34]  Christopher D. Manning,et al.  Better Word Representations with Recursive Neural Networks for Morphology , 2013, CoNLL.

[35]  Yoshimasa Tsuruoka,et al.  Improving Chinese Word Segmentation and POS Tagging with Semi-supervised Methods Using Large Auto-Analyzed Data , 2011, IJCNLP.

[36]  Xiaoqing Zheng,et al.  Deep Learning for Chinese Word Segmentation and POS Tagging , 2013, EMNLP.

[37]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..