CyNER: A Python Library for Cybersecurity Named Entity Recognition

Open Cyber threat intelligence (OpenCTI) information is available in an unstructured format from heterogeneous sources on the Internet. We present CyNER, an open-source python library for cybersecurity named entity recognition (NER). CyNER combines transformer-based models for extracting cybersecurity-related entities, heuristics for extracting different indicators of compromise, and publicly available NER models for generic entity types. We provide models trained on a diverse corpus that users can readily use. Events are described as classes in previous research - MALOnt2.0 (Christian et al., 2021) and MALOnt (Rastogi et al., 2020) and together extract a wide range of malware attack details from a threat intelligence corpus. The user can combine predictions from multiple different approaches to suit their needs. The library is made publicly available.

[1]  Asahi Ushio,et al.  T-NER: An All-Round Python Library for Transformer-based Named Entity Recognition , 2022, EACL.

[2]  Nidhi Rastogi,et al.  An Ontology-driven Knowledge Graph for Android Malware , 2021, CCS.

[3]  Chun Guo,et al.  The Named Entity Recognition of Chinese Cybersecurity Using an Active Learning Strategy , 2021, Wirel. Commun. Mob. Comput..

[4]  Tim Finin,et al.  Creating Cybersecurity Knowledge Graphs From Malware After Action Reports , 2020, IEEE Access.

[5]  Mohammed J. Zaki,et al.  MALOnt: An Ontology for Malware Threat Intelligence , 2020, Deployable Machine Learning for Security Defense.

[6]  Jaechoon Jo,et al.  Automatic extraction of named entities of cyber threats using a deep Bi-LSTM-CRF network , 2020, International Journal of Machine Learning and Cybernetics.

[7]  Myle Ott,et al.  Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.

[8]  R'emi Louf,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[9]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[10]  Roland Vollgraf,et al.  FLAIR: An Easy-to-Use Framework for State-of-the-Art NLP , 2019, NAACL.

[11]  Katrin Franke,et al.  Extracting cyber threat intelligence from hacker forums: Support vector machines versus convolutional neural networks , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[12]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.

[13]  Timothy W. Finin,et al.  Extracting Cybersecurity Related Linked Data from Text , 2013, 2013 IEEE Seventh International Conference on Semantic Computing.

[14]  Sampo Pyysalo,et al.  brat: a Web-based Tool for NLP-Assisted Text Annotation , 2012, EACL.

[15]  Mitchell P. Marcus,et al.  OntoNotes: The 90% Solution , 2006, NAACL.

[16]  Bo Jiang,et al.  Cybersecurity Named Entity Recognition Using Multi-Modal Ensemble Learning , 2020, IEEE Access.

[17]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.