MONPA: Multi-objective Named-entity and Part-of-speech Annotator for Chinese using Recurrent Neural Network

Part-of-speech (POS) tagging and named entity recognition (NER) are crucial steps in natural language processing. In addition, the difficulty of word segmentation places additional burden on those who intend to deal with languages such as Chinese, and pipelined systems often suffer from error propagation. This work proposes an end-to-end model using character-based recurrent neural network (RNN) to jointly accomplish segmentation, POS tagging and NER of a Chinese sentence. Experiments on previous word segmentation and NER datasets show that a single model with the proposed architecture is comparable to those trained specifically for each task, and outperforms freely-available softwares. Moreover, we provide a web-based interface for the public to easily access this resource.

[1]  Aitao Chen,et al.  Unigram Language Model for Chinese Word Segmentation , 2005, SIGHAN@IJCNLP 2005.

[2]  Andrew McCallum,et al.  Chinese Segmentation and New Word Detection using Conditional Random Fields , 2004, COLING.

[3]  Changning Huang,et al.  Chinese Word Segmentation and Named Entity Recognition: A Pragmatic Approach , 2005, CL.

[4]  Yang Liu,et al.  Joint Chinese Word Segmentation, POS Tagging and Parsing , 2012, EMNLP-CoNLL.

[5]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[6]  Yuk Wah Wong,et al.  Maximum Entropy Word Segmentation of Chinese Text , 2006, SIGHAN@COLING/ACL.

[7]  Rémi Zajac,et al.  SYSTRAN's Chinese Word Segmentation , 2003, SIGHAN.

[8]  Keh-Jiann Chen,et al.  Improving PCFG Chinese Parsing with Context-Dependent Probability Re-estimation , 2012, CIPS-SIGHAN.

[9]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[10]  Isabel Trancoso,et al.  Graph-based Semi-Supervised Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging , 2013, ACL.

[11]  Keh-Jiann Chen,et al.  Introduction to CKIP Chinese Word Segmentation System for the First International Chinese Word Segmentation Bakeoff , 2003, SIGHAN.

[12]  Hai Zhao,et al.  An Improved Chinese Word Segmentation System with Conditional Random Field , 2006, SIGHAN@COLING/ACL.

[13]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[14]  Xiaoqing Zheng,et al.  Deep Learning for Chinese Word Segmentation and POS Tagging , 2013, EMNLP.

[15]  Jürgen Schmidhuber,et al.  LSTM: A Search Space Odyssey , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[16]  Jürgen Schmidhuber,et al.  Bidirectional LSTM Networks for Improved Phoneme Classification and Recognition , 2005, ICANN.

[17]  Weiwei Sun,et al.  A Stacked Sub-Word Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging , 2011, ACL.

[18]  Xinnian Mao,et al.  Chinese Word Segmentation and Named Entity Recognition Based on Conditional Random Fields , 2008, IJCNLP.

[19]  Peter J. Haug,et al.  Improving performance of natural language processing part-of-speech tagging on clinical narratives through domain adaptation , 2013, J. Am. Medical Informatics Assoc..

[20]  Hwee Tou Ng,et al.  A Maximum Entropy Approach to Chinese Word Segmentation , 2005, SIGHAN@IJCNLP 2005.

[21]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[22]  Samuel W. K. Chan,et al.  An Agent-Based Approach to Chinese Word Segmentation , 2008, IJCNLP.

[23]  Jürgen Schmidhuber,et al.  LSTM can Solve Hard Long Time Lag Problems , 1996, NIPS.

[24]  Wen-Lian Hsu,et al.  On Using Ensemble Methods for Chinese Named Entity Recognition , 2006, SIGHAN@COLING/ACL.

[25]  Marine Carpuat,et al.  Boosting for Chinese Named Entity Recognition , 2006, SIGHAN@COLING/ACL.

[26]  Yue Zhang,et al.  A Transition-based Model for Joint Segmentation, POS-tagging and Normalization , 2015, EMNLP.

[27]  Xihong Wu,et al.  Chinese Word Segmentation with Maximum Entropy and N-gram Language Model , 2006, SIGHAN@COLING/ACL.