Chinese Word Segmentation with Heterogeneous Graph Neural Network

In recent years, deep learning has achieved significant success in the Chinese word segmentation (CWS) task. Most of these methods improve the performance of CWS by leveraging external information, e.g., words, sub-words, syntax. However, existing approaches fail to effectively integrate the multi-level linguistic information and also ignore the structural feature of these external information. Therefore, in this paper, we proposed a framework to improve CWS, named HGNSeg. It exploits multi-level external information sufficiently with the pre-trained language model and heterogeneous graph neural network. The experimental results on six benchmark datasets (e.g., Bakeoff 2005, Bakeoff 2008) validate that our approach can effectively improve the performance of Chinese word segmentation. Importantly, in cross-domain scenarios, our method also shows a strong ability to alleviate the OOV problem.

[1]  Yuan Luo,et al.  Graph Convolutional Networks for Text Classification , 2018, AAAI.

[2]  Wei Chu,et al.  Toward Fast and Accurate Neural Chinese Word Segmentation with Multi-Criteria Learning , 2020, COLING.

[3]  Ning Ding,et al.  Coupling Distant Annotation and Adversarial Training for Cross-Domain Chinese Word Segmentation , 2020, ACL.

[4]  Xiaotie Deng,et al.  Accessor Variety Criteria for Chinese Word Extraction , 2004, CL.

[5]  Xavier Bresson,et al.  Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering , 2016, NIPS.

[6]  Diego Marcheggiani,et al.  Encoding Sentences with Graph Convolutional Networks for Semantic Role Labeling , 2017, EMNLP.

[7]  Wenming Xiao,et al.  Lexicon Enhanced Chinese Sequence Labeling Using BERT Adapter , 2021, ACL.

[8]  Daisuke Kawahara,et al.  Neural Joint Model for Transition-based Chinese Syntactic Analysis , 2017, ACL.

[9]  Xiaoqing Zheng,et al.  Deep Learning for Chinese Word Segmentation and POS Tagging , 2013, EMNLP.

[10]  Hengzhi Pei,et al.  A Concise Model for Multi-Criteria Chinese Word Segmentation with Transformer Encoder , 2019, FINDINGS.

[11]  Yonggang Wang,et al.  Improving Chinese Word Segmentation with Wordhood Memory Networks , 2020, ACL.

[12]  Xing Xie,et al.  Neural Chinese Word Segmentation with Dictionary Knowledge , 2018, NLPCC.

[13]  Likun Qiu,et al.  Improving Cross-Domain Chinese Word Segmentation with Word Embeddings , 2019, NAACL-HLT.

[14]  Xuanjing Huang,et al.  Adversarial Multi-Criteria Learning for Chinese Word Segmentation , 2017, ACL.

[15]  Fei Xia,et al.  Joint Chinese Word Segmentation and Part-of-speech Tagging via Multi-channel Attention of Character N-grams , 2020, COLING.

[16]  Jinlan Fu,et al.  Neural Networks Incorporating Dictionaries for Chinese Word Segmentation , 2018, AAAI.

[17]  Yue Zhang,et al.  Domain Adaptation for CRF-based Chinese Word Segmentation using Free Annotations , 2014, EMNLP.

[18]  Ji Ma,et al.  State-of-the-art Chinese Word Segmentation with Bi-LSTMs , 2018, EMNLP.

[19]  Masao Utiyama,et al.  Incorporating Word Attention into Character-Based Word Segmentation , 2019, NAACL.

[20]  Yue Zhang,et al.  Neural Word Segmentation with Rich Pretraining , 2017, ACL.

[21]  Alex Graves,et al.  Conditional Image Generation with PixelCNN Decoders , 2016, NIPS.

[22]  Kaiyu Huang,et al.  A Joint Multiple Criteria Model in Transfer Learning for Cross-domain Chinese Word Segmentation , 2020, EMNLP.

[23]  Xuanjing Huang,et al.  Long Short-Term Memory Neural Networks for Chinese Word Segmentation , 2015, EMNLP.

[24]  Khalil Sima'an,et al.  Graph Convolutional Encoders for Syntax-aware Neural Machine Translation , 2017, EMNLP.

[25]  Houfeng Wang,et al.  Text Level Graph Neural Network for Text Classification , 2019, EMNLP.

[26]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[27]  Degen Huang,et al.  Lexicon-Based Graph Convolutional Network for Chinese Word Segmentation , 2021, EMNLP.

[28]  Yann Dauphin,et al.  Language Modeling with Gated Convolutional Networks , 2016, ICML.

[29]  Nianwen Xu,et al.  Chinese Word Segmentation as Character Tagging , 2003, Int. J. Comput. Linguistics Chin. Lang. Process..

[30]  Yue Zhang,et al.  Word Segmentation for Chinese Novels , 2015, AAAI.

[31]  Alán Aspuru-Guzik,et al.  Convolutional Networks on Graphs for Learning Molecular Fingerprints , 2015, NIPS.

[32]  Tsung-Hui Chang,et al.  Exploring Word Segmentation and Medical Concept Recognition for Chinese Medical Texts , 2021, BIONLP.

[33]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[34]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[35]  Nan Duan,et al.  Compare to The Knowledge: Graph Neural Fake News Detection with External Knowledge , 2021, ACL.

[36]  Xu Sun,et al.  Dependency-based Gated Recursive Neural Network for Chinese Word Segmentation , 2016, ACL.

[37]  Jörg Tiedemann,et al.  Character-based Joint Segmentation and POS Tagging for Chinese using Bidirectional RNN-CRF , 2017, IJCNLP.

[38]  Linmei Hu,et al.  Heterogeneous Graph Attention Networks for Semi-supervised Short Text Classification , 2019, EMNLP.

[39]  Xiaolin Du,et al.  Chinese Word Segmentation in Electronic Medical Record Text via Graph Neural Network-Bidirectional LSTM-CRF Model , 2020, 2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[40]  Thomas Emerson,et al.  The Second International Chinese Word Segmentation Bakeoff , 2005, IJCNLP.

[41]  Wanxiang Che,et al.  N-LTP: A Open-source Neural Chinese Language Technology Platform with Pretrained Models , 2020, ArXiv.

[42]  Yonggang Wang,et al.  Joint Chinese Word Segmentation and Part-of-speech Tagging via Two-way Attentions of Auto-analyzed Knowledge , 2020, ACL.

[43]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[44]  Xiao Chen,et al.  The Fourth International Chinese Language Processing Bakeoff: Chinese Word Segmentation, Named Entity Recognition and Chinese POS Tagging , 2008, IJCNLP.

[45]  Ying Liu,et al.  Encoding multi-granularity structural information for joint Chinese word segmentation and POS tagging , 2020, Pattern Recognit. Lett..

[46]  Lei Wu,et al.  Effective Neural Solution for Multi-Criteria Word Segmentation , 2017, ArXiv.

[47]  Yue Zhang,et al.  Transition-Based Neural Word Segmentation , 2016, ACL.