Tracing CVE Vulnerability Information to CAPEC Attack Patterns Using Natural Language Processing Techniques

For effective vulnerability management, vulnerability and attack information must be collected quickly and efficiently. A security knowledge repository can collect such information. The Common Vulnerabilities and Exposures (CVE) provides known vulnerabilities of products, while the Common Attack Pattern Enumeration and Classification (CAPEC) stores attack patterns, which are descriptions of common attributes and approaches employed by adversaries to exploit known weaknesses. Due to the fact that the information in these two repositories are not linked, identifying related CAPEC attack information from CVE vulnerability information is challenging. Currently, the related CAPEC-ID can be traced from the CVE-ID using Common Weakness Enumeration (CWE) in some but not all cases. Here, we propose a method to automatically trace the related CAPEC-IDs from CVE-ID using three similarity measures: TF–IDF, Universal Sentence Encoder (USE), and Sentence-BERT (SBERT). We prepared and used 58 CVE-IDs as test input data. Then, we tested whether we could trace CAPEC-IDs related to each of the 58 CVE-IDs. Additionally, we experimentally confirm that TF–IDF is the best similarity measure, as it traced 48 of the 58 CVE-IDs to the related CAPEC-ID.

[1]  Pilsung Kang,et al.  Multi-co-training for document classification using various document representations: TF-IDF, LDA, and Doc2Vec , 2019, Inf. Sci..

[2]  SIGIR '99: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, August 15-19, 1999, Berkeley, CA, USA , 1999 .

[3]  Benjamin C. M. Fung,et al.  Mining known attack patterns from security-related events , 2015, PeerJ Comput. Sci..

[4]  Panayiotis Kotzanikolaou,et al.  Assessing IoT enabled cyber-physical attack paths against critical systems , 2021, Comput. Secur..

[5]  Olivier Festor,et al.  HuMa: A Multi-layer Framework for Threat Analysis in a Heterogeneous Log Environment , 2017, FPS.

[6]  Babu M. Mehtre,et al.  An overview of vulnerability assessment and penetration testing techniques , 2015, Journal of Computer Virology and Hacking Techniques.

[7]  Vasilios Katos,et al.  A Machine Learning Approach to Dataset Imputation for Software Vulnerabilities , 2020, MCSS.

[8]  Jay F. Nunamaker,et al.  Exploring Emerging Hacker Assets and Key Hackers for Proactive Cyber Threat Intelligence , 2017, J. Manag. Inf. Syst..

[9]  Ralph E. Johnson,et al.  Growing a pattern language (for security) , 2012, Onward! 2012.

[10]  Ville Leppänen,et al.  Toward Validation of Textual Information Retrieval Techniques for Software Weaknesses , 2018, DEXA Workshops.

[11]  Li Zhang,et al.  Learning similarity with cosine similarity ensemble , 2015, Inf. Sci..

[12]  Zheng Luo,et al.  Ontology-based model of network and computer attacks for security assessment , 2013, Journal of Shanghai Jiaotong University (Science).

[13]  Proceedings of the 6th Annual Symposium on Hot Topics in the Science of Security , 2019, HotSoS.

[14]  Shiyan Ou,et al.  Unsupervised Citation Sentence Identification Based on Similarity Measurement , 2018, iConference.