Neural Approach for Named Entity Recognition

The work presents the results of bidirectional long short term memory (BiLSTM) neural network with conditional random fields (CRF) architecture for named entity recognition (NER) problem solving. NER is one of the natural language processing (NLP) tasks. The NER solution allows to recognize and identify specific entities that are relevant for searching in particular data domain. The generalized NER algorithm and neural approach for NER with BiLSTM-CRF model are presented. The use of CRF is responsible for prediction the appearance of searched named entities and improves the recognition quality indicators. The result of the neural network processing is input text information with recognized and designated named entities. It is proposed to use weakly structured resume text information to conduct experiments with BiLSTM-CRF model for named entities recognition. Ten types of named entities are chosen for neural network processing, such as: person, date, location, organization, etc. Own created corpus of resume documents marked manually was used as a data set for BiLSTM-CRF neural model training, validation and testing. Analysis of the adequacy of the proposed approach was carried out using precision, recall and balanced measure F1 metrics. The average recognition values on the testing set were: precision 79,06%, recall 71,51% and F1 75,09%. The best recognition scores were obtained for named entity “date”: precision 92,12%, recall 81,60%, F1 86,54%. The developed neural model and software have practical value for solving problem of resume summarizing and ranking candidates for work as they can be used to form an array of incoming data.

[1]  Jinqiao Shi,et al.  Character-based BiLSTM-CRF Incorporating POS and Dictionaries for Chinese Opinion Target Extraction , 2018, ACML.

[2]  Karin M. Verspoor,et al.  Comparing CNN and LSTM character-level embeddings in BiLSTM-CRF models for chemical and disease named entity recognition , 2018, Louhi@EMNLP.

[3]  Wei Xu,et al.  Bidirectional LSTM-CRF Models for Sequence Tagging , 2015, ArXiv.

[4]  Min Zhu,et al.  New Research on Transfer Learning Model of Named Entity Recognition , 2019 .

[5]  Keun Ho Ryu,et al.  Ontology-Based Healthcare Named Entity Recognition from Twitter Messages Using a Recurrent Neural Network Approach , 2019, International journal of environmental research and public health.

[6]  Sergiy D. Pogorilyy,et al.  Method of noun phrase detection in Ukrainian texts , 2020, Control systems and computers.

[7]  Paolo Rosso,et al.  Arabic Named Entity Recognition using Conditional Random Fields , 2008 .

[8]  Jonathan Berant,et al.  Text Segmentation as a Supervised Learning Task , 2018, NAACL.

[9]  Henda Hajjami Ben Ghézala,et al.  Comparative study of word embedding methods in topic segmentation , 2017, KES.

[10]  Miss A.O. Penney (b) , 1974, The New Yale Book of Quotations.

[11]  Yun Xu,et al.  On Splitting Training and Validation Set: A Comparative Study of Cross-Validation, Bootstrap and Systematic Sampling for Estimating the Generalization Performance of Supervised Learning , 2018, Journal of Analysis and Testing.

[12]  Isabel Segura-Bedmar,et al.  Protected Health Information Recognition by BiLSTM-CRF , 2019, IberLEF@SEPLN.

[13]  Kalina Bontcheva,et al.  Generalisation in named entity recognition: A quantitative analysis , 2017, Comput. Speech Lang..

[14]  Harshit Kumar,et al.  Dialogue Act Sequence Labeling using Hierarchical encoder with CRF , 2017, AAAI.