Deep Learning for Chinese Word Segmentation and POS Tagging

This study explores the feasibility of performing Chinese word segmentation (CWS) and POS tagging by deep learning. We try to avoid task-specific feature engineering, and use deep layers of neural networks to discover relevant features to the tasks. We leverage large-scale unlabeled data to improve internal representation of Chinese characters, and use these improved representations to enhance supervised word segmentation and POS tagging models. Our networks achieved close to state-of-theart performance with minimal computational cost. We also describe a perceptron-style algorithm for training the neural networks, as an alternative to maximum-likelihood method, to speed up the training process and make the learning algorithm easier to be implemented.

[1]  Gina-Anne Levow,et al.  The Third International Chinese Language Processing Bakeoff: Word Segmentation and Named Entity Recognition , 2006, SIGHAN@COLING/ACL.

[2]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[3]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[4]  Stephen Clark,et al.  Joint Word Segmentation and POS Tagging Using a Single Perceptron , 2008, ACL.

[5]  Andrew McCallum,et al.  Chinese Segmentation and New Word Detection using Conditional Random Fields , 2004, COLING.

[6]  Guodong Zhou,et al.  Chinese Word Segmentation and Named Entity Recognition Based on a Context-Dependent Mutual Information Independence Model , 2006, SIGHAN@COLING/ACL.

[7]  Weiwei Sun,et al.  Enhancing Chinese Word Segmentation Using Unlabeled Data , 2011, EMNLP.

[8]  Nianwen Xu,et al.  Chinese Word Segmentation as Character Tagging , 2003, Int. J. Comput. Linguistics Chin. Lang. Process..

[9]  Ronan Collobert,et al.  Deep Learning for Efficient Discriminative Parsing , 2011, AISTATS.

[10]  Andrew Y. Ng,et al.  Parsing Natural Scenes and Natural Language with Recursive Neural Networks , 2011, ICML.

[11]  Hai Zhao,et al.  An Improved Chinese Word Segmentation System with Conditional Random Field , 2006, SIGHAN@COLING/ACL.

[12]  Wen-Lian Hsu,et al.  On Closed Task of Chinese Word Segmentation: An Improved CRF Model Coupled with Character Clustering and Automatically Generated Template Matching , 2006, SIGHAN@COLING/ACL.

[13]  Xiaotie Deng,et al.  Accessor Variety Criteria for Chinese Word Extraction , 2004, CL.

[14]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[15]  D. E. Rumelhart,et al.  Learning internal representations by back-propagating errors , 1986 .

[16]  Xihong Wu,et al.  Chinese Word Segmentation with Maximum Entropy and N-gram Language Model , 2006, SIGHAN@COLING/ACL.

[17]  Weiwei Sun,et al.  A Stacked Sub-Word Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging , 2011, ACL.

[18]  Stephen Clark,et al.  Chinese Segmentation with a Word-Based Perceptron Algorithm , 2007, ACL.

[19]  Qun Liu,et al.  A Cascaded Linear Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging , 2008, ACL.

[20]  L. Bottou Stochastic Gradient Learning in Neural Networks , 1991 .

[21]  Kumiko Tanaka-Ishii,et al.  Unsupervised Segmentation of Chinese Text by Use of Branching Entropy , 2006, ACL.

[22]  Le Sun,et al.  Chinese Word Segmentation and Named Entity Recognition Based on Conditional Random Fields Models , 2006, SIGHAN@COLING/ACL.

[23]  Hitoshi Isahara,et al.  An Error-Driven Word-Character Hybrid Model for Joint Chinese Word Segmentation and POS Tagging , 2009, ACL/IJCNLP.

[24]  Hai Zhao,et al.  Integrating unsupervised and supervised word segmentation: The role of goodness measures , 2011, Inf. Sci..