ALBERT-Based Chinese Named Entity Recognition

Chinese named entity recognition (NER) has been an important problem in natural language processing (NLP) field. Most existing methods mainly use traditional deep learning models which cannot fully leverage contextual dependencies that are very important for capturing the relations between words or characters for modeling. To address this problem, various language representation methods such as BERT have been proposed to learn the global context information. Although these methods can achieve good results, the large number of parameters limited the efficiency and application in real-world scenarios. To improve both of the performance and efficiency, this paper proposes an ALBERT-based Chinese NER method which uses ALBERT, a Lite version of BERT, as the pre-trained model to reduce model parameters and to improve the performance through sharing cross-layer parameters. Besides, it uses conditional random field (CRF) to capture the sentence-level correlation information between words or characters to alleviate the tagging inconsistency problems. Experimental results demonstrate that our method outperforms the comparison methods over 4.23–11.17% in terms of relative F1-measure with only 4% of BERT’s parameters.

[1]  Yue Zhang,et al.  Chinese NER Using Lattice LSTM , 2018, ACL.

[2]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[3]  Thomas Pock,et al.  End-to-End Training of Hybrid CNN-CRF Models for Stereo , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[5]  Wei Xu,et al.  Bidirectional LSTM-CRF Models for Sequence Tagging , 2015, ArXiv.

[6]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[7]  Deepti Chopra,et al.  Named Entity Recognition using Hidden Markov Model (HMM) , 2012 .

[8]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[9]  Zaiqing Nie,et al.  Joint Entity Recognition and Disambiguation , 2015, EMNLP.

[10]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[11]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[12]  E. Zehavi,et al.  A pragmatic approach to trellis-coded modulation , 1989, IEEE Communications Magazine.

[13]  Sampo Pyysalo,et al.  Attending to Characters in Neural Sequence Labeling Models , 2016, COLING.

[14]  Wei Jiang,et al.  Improving Sequence Tagging using Machine-Learning Techniques , 2006, 2006 International Conference on Machine Learning and Cybernetics.

[15]  Yu Cheng,et al.  Patient Knowledge Distillation for BERT Model Compression , 2019, EMNLP.

[16]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[17]  Chandra Bhagavatula,et al.  Semi-supervised sequence tagging with bidirectional language models , 2017, ACL.

[18]  Jörg Tiedemann,et al.  Character-based Joint Segmentation and POS Tagging for Chinese using Bidirectional RNN-CRF , 2017, IJCNLP.

[19]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[20]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.