The Named Entity Recognition of Chinese Cybersecurity Using an Active Learning Strategy

In data-driven big data security analysis, knowledge graph-based multisource heterogeneous threat data organization, association mining, and inference analysis attach increasinginterest in the field of cybersecurity. Although the construction of knowledge graph based on deep learning has achieved great success, the construction of a largescale, high-quality, and domain-specific knowledge graph needs a manual annotation of large corpora, which means it is very difficult. To tackle this problem, we present a straightforward active learning strategy for cybersecurity entity recognition utilizing deep learning technology. BERT pre-trained model and residual dilation convolutional neural networks (RDCNN) are introduced to learn entity context features, and the conditional random field (CRF) layer is employed as a tag decoder. Then, taking advantages of the output results and distribution of cybersecurity entities, we propose an active learning strategy named TPCL that considers the uncertainty, confidence, and diversity. We evaluated TPCL on the general domain datasets and cybersecurity datasets, respectively. The experimental results show that TPCL performs better than the traditional strategies in terms of accuracy and F1. Moreover, compared with the general field, it has better performance in the cybersecurity field and is more suitable for the Chinese entity recognition task in this field.

[1]  Trevor Darrell,et al.  Variational Adversarial Active Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[2]  Abdelaziz Bouras,et al.  LSTM Recurrent Neural Networks for Cybersecurity Named Entity Recognition , 2024, 2409.10521.

[3]  Andrew McCallum,et al.  Reducing Labeling Effort for Structured Prediction Tasks , 2005, AAAI.

[4]  Anima Anandkumar,et al.  Deep Active Learning for Named Entity Recognition , 2017, Rep4NLP@ACL.

[5]  Guowei Shen,et al.  Data-Driven Cybersecurity Knowledge Graph Construction for Industrial Control System Security , 2020, Wirel. Commun. Mob. Comput..

[6]  Hua Xu,et al.  A study of active learning methods for named entity recognition in clinical text , 2015, J. Biomed. Informatics.

[7]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[8]  Vincent Claveau,et al.  Strategies to Select Examples for Active Learning with Conditional Random Fields , 2017, CICLing.

[9]  Xin Jin,et al.  A network security entity recognition method based on feature template and CNN-BiLSTM-CRF , 2019, Frontiers Inf. Technol. Electron. Eng..

[10]  Jiyong Jang,et al.  Threat Intelligence Computing , 2018, CCS.

[11]  Jian Su,et al.  Multi-Criteria-based Active Learning for Named Entity Recognition , 2004, ACL.

[12]  Zhongjie Wang,et al.  LTP: A New Active Learning Strategy for Bert-CRF Based Named Entity Recognition , 2020, ArXiv.

[13]  Diego Marcheggiani,et al.  An Experimental Comparison of Active Learning Strategies for Partially Labeled Sequences , 2014, EMNLP.

[14]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[15]  In So Kweon,et al.  Learning Loss for Active Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Bo Jiang,et al.  Cybersecurity Named Entity Recognition Using Multi-Modal Ensemble Learning , 2020, IEEE Access.

[17]  Yuanbo Guo,et al.  A Self-Attention-Based Approach for Named Entity Recognition in Cybersecurity , 2019, 2019 15th International Conference on Computational Intelligence and Security (CIS).

[18]  Madalina Zurini,et al.  Named-Entity-Recognition-Based Automated System for Diagnosing Cybersecurity Situations in IoT Networks , 2019, Sensors.