CyBERT: Cybersecurity Claim Classification by Fine-Tuning the BERT Language Model

We introduce CyBERT, a cybersecurity feature claims classifier based on bidirectional encoder representations from transformers and a key component in our semi-automated cybersecurity vetting for industrial control systems (ICS). To train CyBERT, we created a corpus of labeled sequences from ICS device documentation collected across a wide range of vendors and devices. This corpus provides the foundation for fine-tuning BERT’s language model, including a prediction-guided relabeling process. We propose an approach to obtain optimal hyperparameters, including the learning rate, the number of dense layers, and their configuration, to increase the accuracy of our classifier. Fine-tuning all hyperparameters of the resulting model led to an increase in classification accuracy from 76% obtained with BertForSequenceClassification’s original architecture to 94.4% obtained with CyBERT. Furthermore, we evaluated CyBERT for the impact of randomness in the initialization, training, and data-sampling phases. CyBERT demonstrated a standard deviation of ±0.6% during validation across 100 random seed values. Finally, we also compared the performance of CyBERT to other well-established language models including GPT2, ULMFiT, and ELMo, as well as neural network models such as CNN, LSTM, and BiLSTM. The results showed that CyBERT outperforms these models on the validation accuracy and the F1 score, validating CyBERT’s robustness and accuracy as a cybersecurity feature claims classifier.

[1]  R. Reddy Universal Language Model Fine-Tuning for Text Classification , 2023, International Journal for Research in Applied Science and Engineering Technology.

[2]  Chen Gao,et al.  Data and knowledge-driven named entity recognition for cyber security , 2021, Cybersecurity.

[3]  Chun Guo,et al.  The Named Entity Recognition of Chinese Cybersecurity Using an Active Learning Strategy , 2021, Wirel. Commun. Mob. Comput..

[4]  Jingju Liu,et al.  Named Entity Recognition Using BERT with Whole World Masking in Cybersecurity Domain , 2021, 2021 IEEE 6th International Conference on Big Data Analytics (ICBDA).

[5]  Emily M. Bender,et al.  On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜 , 2021, FAccT.

[6]  Ehab Al-Shaer,et al.  V2W-BERT: A Framework for Effective Hierarchical Multiclass Classification of Software Vulnerabilities , 2021, 2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA).

[7]  Bernd Irlenbusch,et al.  The corruptive force of AI-generated advice , 2021, ArXiv.

[8]  Zhouguo Chen,et al.  Joint BERT Model based Cybersecurity Named Entity Recognition , 2021, ICSIM.

[9]  Shubhashis Sengupta,et al.  Causal-BERT : Language models for causality detection between events expressed in text , 2020, SAI.

[10]  Alun D. Preece,et al.  Go Simple and Pre-Train on Domain-Specific Corpora: On the Role of Training Data for Text Classification , 2020, COLING.

[11]  Jinli Cao,et al.  Apply transfer learning to cybersecurity: Predicting exploitability of vulnerabilities by description , 2020, Knowl. Based Syst..

[12]  Inna Vogel,et al.  Detecting Fake News Spreaders on Twitter from a Multilingual Perspective , 2020, 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA).

[13]  Matloob Khushi,et al.  BioALBERT: A Simple and Effective Pre-trained Language Model for Biomedical Named Entity Recognition , 2020, 2021 International Joint Conference on Neural Networks (IJCNN).

[14]  Kris McGuffie,et al.  The Radicalization Risks of GPT-3 and Advanced Neural Language Models , 2020, ArXiv.

[15]  Hamid Sharif,et al.  A Novel Vetting Approach to Cybersecurity Verification in Energy Grid Systems , 2020, 2020 IEEE Kansas Power and Energy Conference (KPEC).

[16]  Jun Zhao,et al.  FinBERT: A Pre-trained Financial Language Representation Model for Financial Text Mining , 2020, IJCAI.

[17]  Natalia V. Loukachevitch,et al.  Using BERT and Augmentation in Named Entity Recognition for Cybersecurity Domain , 2020, NLDB.

[18]  Emily Denton,et al.  Social Biases in NLP Models as Barriers for Persons with Disabilities , 2020, ACL.

[19]  知秀 柴田 5分で分かる!? 有名論文ナナメ読み:Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding , 2020 .

[20]  Ali Farhadi,et al.  Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping , 2020, ArXiv.

[21]  Hueiseok Lim,et al.  exBAKE: Automatic Fake News Detection Model Based on Bidirectional Encoder Representations from Transformers (BERT) , 2019, Applied Sciences.

[22]  Kevin Gimpel,et al.  ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.

[23]  Dogu Araci,et al.  FinBERT: Financial Sentiment Analysis with Pre-trained Language Models , 2019, ArXiv.

[24]  Xuanjing Huang,et al.  How to Fine-Tune BERT for Text Classification? , 2019, CCL.

[25]  Jieh Hsiang,et al.  PatentBERT: Patent Classification with Fine-Tuning a pre-trained BERT Model , 2019, ArXiv.

[26]  Noe Casas,et al.  Evaluating the Underlying Gender Bias in Contextualized Word Embeddings , 2019, Proceedings of the First Workshop on Gender Bias in Natural Language Processing.

[27]  Gang Liu,et al.  Bidirectional LSTM with attention mechanism and convolutional layer for text classification , 2019, Neurocomputing.

[28]  Iz Beltagy,et al.  SciBERT: A Pretrained Language Model for Scientific Text , 2019, EMNLP.

[29]  Jaewoo Kang,et al.  BioBERT: a pre-trained biomedical language representation model for biomedical text mining , 2019, Bioinform..

[30]  Roland Vollgraf,et al.  Contextual String Embeddings for Sequence Labeling , 2018, COLING.

[31]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[32]  Sebastian Ruder,et al.  Universal Language Model Fine-tuning for Text Classification , 2018, ACL.

[33]  Richard Socher,et al.  Regularizing and Optimizing LSTM Language Models , 2017, ICLR.

[34]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[35]  Quoc V. Le,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[36]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[37]  Xuanjing Huang,et al.  Recurrent Neural Network for Text Classification with Multi-Task Learning , 2016, IJCAI.

[38]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[39]  Zhiyuan Liu,et al.  A C-LSTM Neural Network for Text Classification , 2015, ArXiv.

[40]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[41]  Dan Hermes,et al.  Xamarin Mobile Application Development: Cross-Platform C# and Xamarin. Forms Fundamentals , 2015 .

[42]  Leslie N. Smith,et al.  Cyclical Learning Rates for Training Neural Networks , 2015, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[43]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[44]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[45]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[46]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[47]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[48]  Chao Liu,et al.  A Two-Stage Model Based on BERT for Short Fake News Detection , 2019, KSEM.

[49]  Michael McCloskey,et al.  Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .