AISecKG: Knowledge Graph Dataset for Cybersecurity Education

Cybersecurity education is exceptionally challenging as it involves learning the complex attacks; tools and developing critical problem-solving skills to defend the systems. For a student or novice researcher in the cybersecurity domain, there is a need to design an adaptive learning strategy that can break complex tasks and concepts into simple representations. An AI-enabled automated cybersecurity education system can improve cognitive engagement and active learning. Knowledge graphs (KG) provide a visual representation in a graph that can reason and interpret from the underlying data, making them suitable for use in education and interactive learning. However, there are no publicly available datasets for the cybersecurity education domain to build such systems. The data is present as unstructured educational course material, Wiki pages, capture the flag (CTF) writeups, etc. Creating knowledge graphs from unstructured text is challenging without an ontology or annotated dataset. However, data annotation for cybersecurity needs domain experts. To address these gaps, we made three contributions in this paper. First, we propose an ontology for the cybersecurity education domain for students and novice learners. Second, we develop AISecKG, a triple dataset with cybersecurity-related entities and relations as defined by the ontology. This dataset can be used to construct knowledge graphs to teach cybersecurity and promote cognitive learning. It can also be used to build downstream applications like recommendation systems or self-learning question-answering systems for students. The dataset would also help identify malicious named entities and their probable impact. Third, using this dataset, we show a downstream application to extract custom-named entities from texts and educational material on cybersecurity.

[1]  Garima Agrawal,et al.  Building Knowledge Graphs from Unstructured Texts: Applications and Impact Analyses in Cybersecurity Education , 2022, Inf..

[2]  Nidhi Rastogi,et al.  CyNER: A Python Library for Cybersecurity Named Entity Recognition , 2022, arXiv.org.

[3]  Dijiang Huang,et al.  Problem- Based Cybersecurity Lab with Knowledge Graph as Guidance , 2022, Journal of Artificial Intelligence and Technology.

[4]  A. Shelupanov,et al.  The Comparison of Cybersecurity Datasets , 2022, Data.

[5]  Siddharth Srivastava,et al.  JEDAI: A System for Skill-Aligned Explainable Robot Planning , 2021, AAMAS.

[6]  Dijiang Huang,et al.  NeoCyberKG: Enhancing Cybersecurity Laboratories with a Machine Learning-enabled Knowledge Graph , 2021, ITiCSE.

[7]  Zhigang Lu,et al.  Cybersecurity named entity recognition using bidirectional long short-term memory with conditional random fields , 2021, Tsinghua Science and Technology.

[8]  Kuntal Kumar Pal,et al.  Constructing Flow Graphs from Procedural Cybersecurity Texts , 2021, FINDINGS.

[9]  Chen Gao,et al.  Data and knowledge-driven named entity recognition for cyber security , 2021, Cybersecurity.

[10]  Anupam Joshi,et al.  A Comparative Study of Deep Learning based Named Entity Recognition Algorithms for Cybersecurity , 2020, 2020 IEEE International Conference on Big Data (Big Data).

[11]  Benny Rochwerger,et al.  Conceptual Characterization of Cybersecurity Ontologies , 2020, PoEM.

[12]  Amir Pouran Ben Veyseh,et al.  Introducing a New Dataset for Event Detection in Cybersecurity Texts , 2020, EMNLP.

[13]  Y. Vasiliev Natural Language Processing with Python and spaCy: A Practical Introduction , 2020 .

[14]  Mohammed J. Zaki,et al.  MALOnt: An Ontology for Malware Threat Intelligence , 2020, Deployable Machine Learning for Security Defense.

[15]  Natalia V. Loukachevitch,et al.  Using BERT and Augmentation in Named Entity Recognition for Cybersecurity Domain , 2020, NLDB.

[16]  Gerard de Melo,et al.  Knowledge Graphs , 2020, ACM Comput. Surv..

[17]  Igor Kotenko,et al.  Ontology of Metrics for Cyber Security Assessment , 2019, ARES.

[18]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[19]  Dijiang Huang,et al.  Knowledge Graph based Learning Guidance for Cybersecurity Hands-on Labs , 2019, CompEd.

[20]  Mayank Kejriwal,et al.  Domain-Specific Knowledge Graph Construction , 2019, SpringerBriefs in Computer Science.

[21]  Ricardo Jardim-Gonçalves,et al.  An Ontology-Based Cybersecurity Framework for the Internet of Things , 2018, Sensors.

[22]  Marcelo Arenas,et al.  Foundations of Modern Query Languages for Graph Databases , 2016, ACM Comput. Surv..

[23]  Ankur Padia,et al.  UCO: A Unified Cybersecurity Ontology , 2016, AAAI Workshop: Artificial Intelligence for Cyber Security.

[24]  Michael D. Iannacone,et al.  Developing an Ontology for Cyber Security Knowledge Graphs , 2015, CISR.

[25]  Camille Salinesi,et al.  A Security Ontology for Security Requirements Elicitation , 2015, ESSoS.

[26]  Stefan Fenz,et al.  Formalizing information security knowledge , 2009, ASIACCS '09.

[27]  David Anderson,et al.  Development of an Instrument Designed to Investigate Elements of Science Students’ Metacognition, Self‐Efficacy and Learning Processes: The SEMLI‐S , 2008 .

[28]  Martin Roesch,et al.  Snort - Lightweight Intrusion Detection for Networks , 1999 .

[29]  Erik F. Tjong Kim Sang,et al.  Representing Text Chunks , 1999, EACL.

[30]  B. Benatallah,et al.  Knowledge Graphs in Education and Employability: A Survey on Applications and Techniques , 2022, IEEE Access.

[31]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[32]  Wolfram Wöß,et al.  Towards a Definition of Knowledge Graphs , 2016, SEMANTiCS.

[33]  A Framework for K-12 Science Education: Practices, Crosscutting Concepts, and Core Ideas , 2015 .

[34]  Dan Craigen,et al.  Defining Cybersecurity , 2014 .