Enhancing Chinese Word Segmentation via Pseudo Labels for Practicability

Pre-trained language models (e.g., BERT) significantly alleviate two traditional challenging problems for Chinese word segmentation (CWS): segmentation ambiguity and out-ofvocabulary (OOV) words. However, such improvements are usually achieved on traditional benchmark datasets and not close to an important goal of CWS: practicability (i.e., low complexity as a standalone task and high beneficiality to downstream tasks). To make a trade-off between traditional evaluation and practicability for CWS, we propose a semisupervised neural method via pseudo labels. The neural method consists of a teacher model and a student model, which distills knowledge from unlabeled data to the student model so as to improve both in-domain and out-ofdomain CWS. Experiments show that our proposed method can not only keep the practicability of the lightweight student model but also improve the performance of segmentation effectively. We also evaluate a range of heterogeneous neural architectures of CWS on downstream Chinese NLP tasks. Results of further experiments demonstrate that our proposed segmenter is reliable and practical as a pre-processing step of the downstream NLP tasks at the minimum cost.1

[1]  Yue Zhang,et al.  Word-Context Character Embeddings for Chinese Word Segmentation , 2017, EMNLP.

[2]  Xiaotie Deng,et al.  Accessor Variety Criteria for Chinese Word Extraction , 2004, CL.

[3]  Wei Wu,et al.  Glyce: Glyph-vectors for Chinese Character Representations , 2019, NeurIPS.

[4]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[5]  Thomas Emerson,et al.  The Second International Chinese Word Segmentation Bakeoff , 2005, IJCNLP.

[6]  Ahmed Hassan Awadallah,et al.  Distilling Transformers into Simple Neural Networks with Unlabeled Transfer Data , 2019, arXiv.org.

[7]  Rich Caruana,et al.  Do Deep Nets Really Need to be Deep? , 2013, NIPS.

[8]  Hai Zhao,et al.  Neural Word Segmentation Learning for Chinese , 2016, ACL.

[9]  Xipeng Qiu,et al.  Switch-LSTMs for Multi-Criteria Chinese Word Segmentation , 2018, AAAI.

[10]  Hengzhi Pei,et al.  A Concise Model for Multi-Criteria Chinese Word Segmentation with Transformer Encoder , 2019, FINDINGS.

[11]  Hai Zhao,et al.  A Unified Character-Based Tagging Framework for Chinese Word Segmentation , 2010, TALIP.

[12]  Hai Zhao,et al.  Chinese Word Segmentation: Another Decade Review (2007-2017) , 2019, ArXiv.

[13]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[14]  Xuanjing Huang,et al.  Long Short-Term Memory Neural Networks for Chinese Word Segmentation , 2015, EMNLP.

[15]  Xu Sun,et al.  Exploring Representations from Unlabeled Data with Co-training for Chinese Word Segmentation , 2013, EMNLP.

[16]  Xu Sun,et al.  Fast Online Training with Frequency-Adaptive Learning Rates for Chinese Word Segmentation and New Word Detection , 2012, ACL.

[17]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[18]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[19]  Kaiyu Huang,et al.  A Joint Multiple Criteria Model in Transfer Learning for Cross-domain Chinese Word Segmentation , 2020, EMNLP.

[20]  Wei Xu,et al.  Bidirectional LSTM-CRF Models for Sequence Tagging , 2015, ArXiv.

[21]  Jinlan Fu,et al.  Neural Networks Incorporating Dictionaries for Chinese Word Segmentation , 2018, AAAI.

[22]  Ji Ma,et al.  State-of-the-art Chinese Word Segmentation with Bi-LSTMs , 2018, EMNLP.

[23]  Yonggang Wang,et al.  Improving Chinese Word Segmentation with Wordhood Memory Networks , 2020, ACL.

[24]  Hai Zhao,et al.  Fast and Accurate Neural Word Segmentation for Chinese , 2017, ACL.

[25]  Xiaoqing Zheng,et al.  Deep Learning for Chinese Word Segmentation and POS Tagging , 2013, EMNLP.

[26]  Hai Zhao,et al.  Unsupervised Segmentation Helps Supervised Learning of Character Tagging for Word Segmentation and Named Entity Recognition , 2008, IJCNLP.

[27]  Nan Yu,et al.  A Simple and Effective Neural Model for Joint Word Segmentation and POS Tagging , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[28]  Wanxiang Che,et al.  Pre-Training with Whole Word Masking for Chinese BERT , 2019, ArXiv.

[29]  Qi Zhang,et al.  Neural Networks Incorporating Unlabeled and Partially-labeled Data for Cross-domain Chinese Word Segmentation , 2018, IJCAI.

[30]  Zhao Hai,et al.  Chinese Word Segmentation: A Decade Review , 2007 .

[31]  Yue Zhang,et al.  Subword Encoding in Lattice LSTM for Chinese Word Segmentation , 2018, NAACL.

[32]  Nianwen Xu,et al.  Chinese Word Segmentation as Character Tagging , 2003, Int. J. Comput. Linguistics Chin. Lang. Process..

[33]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[34]  Qun Liu,et al.  TinyBERT: Distilling BERT for Natural Language Understanding , 2020, EMNLP.

[35]  Maosong Sun,et al.  Chinese Word Segmentation without Using Lexicon and Hand-crafted Training Data , 2022, International Conference on Computational Linguistics.

[36]  Sufeng Duan,et al.  Attention Is All You Need for Chinese Word Segmentation , 2019, EMNLP.

[37]  Daniel Jurafsky,et al.  A Conditional Random Field Word Segmenter for Sighan Bakeoff 2005 , 2005, IJCNLP.