Chinese named entity recognition in power domain based on Bi-LSTM-CRF

Efficient recognition of proprietary entities is an important basic work for text data mining and intelligent application in power domain. Traditional power domain Named Entity Recognition (NER) methods rely on feature engineering seriously, which unable to learn power entity features automatically. In order to learn entity features automatically and extract power domain named entities efficiently, a model based on Bidirectional Long Short-Term Memory Neural Networks (Bi-LSTM) and Conditional Random Field (CRF) was proposed in this paper. Word representations were fed into the neural networks as an additional feature and Skip-gram embeddings were trained on power domain corpus. Experimental results showed the precision rate reaches higher than 88.25% and the recalling rate reaches higher than 88.04%, which confirm the method based on Bi-LSTM and CRF is effective for named entity recognition in the power domain.

[1]  Bairong Shen,et al.  Combined SVM-CRFs for Biological Named Entity Recognition with Maximal Bidirectional Squeezing , 2012, PloS one.

[2]  Zhao Ten,et al.  Application Technology of Big Data in Smart Distribution Grid and Its Prospect Analysis , 2014 .

[3]  Xu Sun,et al.  Cross-Domain and Semisupervised Named Entity Recognition in Chinese Social Media: A Unified Model , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[4]  Eduard H. Hovy,et al.  End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF , 2016, ACL.

[5]  Jürgen Schmidhuber,et al.  Framewise phoneme classification with bidirectional LSTM and other neural network architectures , 2005, Neural Networks.

[6]  Ying Qin,et al.  Research of Clinical Named Entity Recognition Based on Bi-LSTM-CRF , 2018, Journal of Shanghai Jiaotong University (Science).

[7]  Yi Qian,et al.  Joint segmentation and named entity recognition using dual decomposition in Chinese discharge summaries. , 2014, Journal of the American Medical Informatics Association : JAMIA.

[8]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[9]  Tzu-Tsung Wong,et al.  Performance evaluation of classification algorithms by k-fold and leave-one-out cross validation , 2015, Pattern Recognit..

[10]  Xiao Sun,et al.  Dual-chain Unequal-state CRF for Chinese new word detection and POS tagging , 2008, 2008 International Conference on Natural Language Processing and Knowledge Engineering.

[11]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[12]  Yongdong Zhang,et al.  Adaptive weighted imbalance learning with application to abnormal activity recognition , 2016, Neurocomputing.

[13]  Li Liu,et al.  A Wide Area Service Oriented Architecture Design for Plug and Play of Power Grid Equipment , 2017, IIKI.

[14]  Martin Hofmann-Apitius,et al.  Detection of IUPAC and IUPAC-like chemical names , 2008, ISMB.

[15]  张剑,et al.  Recurrent Neural Network Language Model Based on Word Vector Features , 2015 .

[16]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[17]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[18]  Xu Sun,et al.  A Unified Model for Cross-Domain and Semi-Supervised Named Entity Recognition in Chinese Social Media , 2017, AAAI.

[19]  Hermann Ney,et al.  From Feedforward to Recurrent LSTM Neural Networks for Language Modeling , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[20]  Xingyu Gao,et al.  The Application of Power Grid Equipment Plug and Play Based on Wide Area SOA , 2018, 2018 IEEE International Conference on Energy Internet (ICEI).

[21]  Nanyun Peng,et al.  Improving Named Entity Recognition for Chinese Social Media with Word Segmentation Representation Learning , 2016, ACL.