论文信息 - Chinese Named Entity Recognition Based on Character-Word Vector Fusion

Chinese Named Entity Recognition Based on Character-Word Vector Fusion

Due to the lack of explicit markers in Chinese text to define the boundaries of words, it is often more difficult to identify named entities in Chinese than in English. At present, the pretreatment of the character or word vector models is adopted in the training of the Chinese named entity recognition model. Aimed at the problems that taking character vector as an input of the neural network cannot use the words’ semantic meanings and give up the words’ explicit boundary information, and taking the word vector as an input of the neural network relies on the accuracy of the segmentation algorithms, a Chinese named entity recognition model based on character word vector fusion CWVF-BiLSTM-CRF (Character Word Vector Fusion-Bidirectional Long-Short Term Memory Networks-Conditional Random Field) is proposed in this paper. First, the Word2Vec is used to obtain the corresponding dictionaries of character-character vector and word-word vector. Second, the character-word vector is integrated as the input unit of the BiLSTM (Bidirectional Long-Short Term Memory) network, and then, the problem of an unreasonable tag sequence is solved using the CRF (conditional random field). By using the presented model, the dependence on the accuracy of the word segmentation algorithm is reduced, and the words’ semantic characteristics are effectively applied. The experimental results show that the model based on character-word vector fusion improves the recognition effect of the Chinese named entity.

Lili Dong | Na Ye | Xin Qin | Xiang Zhang | Kangkang Sun

[1] Md. Mustafizur Rahman,et al. Neural information retrieval: at the end of the early years , 2017, Information Retrieval Journal.

[2] Preeti Nagrath,et al. Sentiment Analysis Using Gated Recurrent Neural Networks , 2020, SN Computer Science.

[3] Zhiyuan Liu,et al. Joint Learning of Character and Word Embeddings , 2015, IJCAI.

[4] Yue Zhang,et al. Chinese NER Using Lattice LSTM , 2018, ACL.

[5] Eduard H. Hovy,et al. End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF , 2016, ACL.

[6] Guillaume Lample,et al. Neural Architectures for Named Entity Recognition , 2016, NAACL.

[7] Tiejun Zhao,et al. Chinese Named Entity Recognition with a Sequence Labeling Approach: Based on Characters, or Based on Words? , 2010, ICIC.

[8] Yue Zhang,et al. Domain Adaptation for CRF-based Chinese Word Segmentation using Free Annotations , 2014, EMNLP.

[9] Ming Chen,et al. A Survey on Named Entity Recognition , 2019, CSPS.