Generating Informative CVE Description From ExploitDB Posts by Extractive Summarization

ExploitDB is one of the important public websites, which contributes a large number of vulnerabilities to official CVE database. Over 60% of these vulnerabilities have highor critical-security risks. Unfortunately, over 73% of exploits appear publicly earlier than the corresponding CVEs, and about 40% of exploits do not even have CVEs. To assist in documenting CVEs for the ExploitDB posts, we propose an open information method to extract 9 key vulnerability aspects (vulnerable product/version/component, vulnerability type, vendor, attacker type, root cause, attack vector and impact) from the verbose and noisy ExploitDB posts. The extracted aspects from an ExploitDB post are then composed into a CVE description according to the suggested CVE description templates, which is must-provided information for requesting new CVEs. Through the evaluation on 13,017 manually labeled sentences and the statistically sampling of 3,456 extracted aspects, we confirm the high accuracy of our extraction method. Compared with 27,230 reference CVE descriptions. Our composed CVE descriptions achieve high ROUGH-L (0.38), a longest common subsequence based metric for evaluating text summarization methods.

[1]  Qing Cai,et al.  Research on Chinese Naming Recognition Model Based on BERT Embedding , 2019, 2019 IEEE 10th International Conference on Software Engineering and Service Science (ICSESS).

[2]  Jing Li,et al.  Software-Specific Named Entity Recognition in Software Engineering Social Content , 2016, 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[3]  Phil Blunsom,et al.  Teaching Machines to Read and Comprehend , 2015, NIPS.

[4]  Jaime G. Carbonell,et al.  Frame-Semantic Role Labeling with Heterogeneous Annotations , 2015, ACL.

[5]  Tudor Dumitras,et al.  From Patching Delays to Infection Symptoms: Using Risk Profiles for an Early Discovery of Vulnerabilities Exploited in the Wild , 2018, USENIX Security Symposium.

[6]  Wenbo Guo,et al.  Towards the Detection of Inconsistencies in Public Security Vulnerability Reports , 2019, USENIX Security Symposium.

[7]  Jackie Chi Kit Cheung,et al.  BanditSum: Extractive Summarization as a Contextual Bandit , 2018, EMNLP.

[8]  Peter Clark,et al.  Automatic Coupling of Answer Extraction and Information Retrieval , 2013, ACL.

[9]  Jing Li,et al.  Learning to Extract API Mentions from Informal Natural Language Discussions , 2016, 2016 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[10]  M. Maybury,et al.  Automatic Summarization , 2002, Computational Linguistics.

[11]  Mark Johnson,et al.  An Improved Non-monotonic Transition System for Dependency Parsing , 2015, EMNLP.

[12]  John Torr Autobank: a semi-automatic annotation tool for developing deep Minimalist Grammar treebanks , 2017, EACL.

[13]  Bowen Zhou,et al.  SummaRuNNer: A Recurrent Neural Network Based Sequence Model for Extractive Summarization of Documents , 2016, AAAI.

[14]  Trent Jaeger,et al.  Using Safety Properties to Generate Vulnerability Patches , 2019, 2019 IEEE Symposium on Security and Privacy (SP).

[15]  Yanick Fratantonio,et al.  Understanding Linux Malware , 2018, 2018 IEEE Symposium on Security and Privacy (SP).

[16]  Tudor Dumitras,et al.  Vulnerability Disclosure in the Age of Social Media: Exploiting Twitter for Predicting Real-World Exploits , 2015, USENIX Security Symposium.

[17]  Peiyuan Zong,et al.  SemFuzz: Semantics-based Automatic Generation of Proof-of-Concept Exploits , 2017, CCS.

[18]  Chantana Phongpensri,et al.  Information Extraction based on Named Entity for Tourism Corpus , 2019, 2019 16th International Joint Conference on Computer Science and Software Engineering (JCSSE).

[19]  S. Chitrakala,et al.  A survey on abstractive text summarization , 2016, 2016 International Conference on Circuit, Power and Computing Technologies (ICCPCT).

[20]  Li Li,et al.  Categorizing and Predicting Invalid Vulnerabilities on Common Vulnerabilities and Exposures , 2018, 2018 25th Asia-Pacific Software Engineering Conference (APSEC).

[21]  Jiamou Sun,et al.  Improving API Caveats Accessibility by Mining API Caveats Knowledge Graph , 2018, 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[22]  Mirella Lapata,et al.  Text Summarization with Pretrained Encoders , 2019, EMNLP.

[23]  Weinan Zhang,et al.  QuAChIE: Question Answering based Chinese Information Extraction System , 2020, SIGIR.

[24]  Yang Liu,et al.  Skyfire: Data-Driven Seed Generation for Fuzzing , 2017, 2017 IEEE Symposium on Security and Privacy (SP).

[25]  Limin Sun,et al.  Understanding and Securing Device Vulnerabilities through Automated Bug Report Analysis , 2019, USENIX Security Symposium.

[26]  Qiang Li,et al.  Acquisitional Rule-based Engine for Discovering Internet-of-Thing Devices , 2018, USENIX Security Symposium.

[27]  Zhenchang Xing,et al.  Easy-to-Deploy API Extraction by Multi-Level Feature Embedding and Transfer Learning , 2019, IEEE Transactions on Software Engineering.

[28]  Pengfei Wang,et al.  How Double-Fetch Situations turn into Double-Fetch Vulnerabilities: A Study of Double Fetches in the Linux Kernel , 2017, USENIX Security Symposium.

[29]  Milos Manic,et al.  Mining Bug Databases for Unidentified Software Vulnerabilities , 2012, 2012 5th International Conference on Human System Interactions.

[30]  Yulong Zhang,et al.  Adaptive Android Kernel Live Patching , 2017, USENIX Security Symposium.

[31]  Tudor Dumitras,et al.  ChainSmith: Automatically Learning the Semantics of Malicious Campaigns by Mining Threat Intelligence Reports , 2018, 2018 IEEE European Symposium on Security and Privacy (EuroS&P).

[32]  Guoqiang Li,et al.  Data-Driven Proactive Policy Assurance of Post Quality in Community q&a Sites , 2018, Proc. ACM Hum. Comput. Interact..

[33]  Antonella Santone,et al.  How Discover a Malware using Model Checking , 2017, AsiaCCS.

[34]  Zhenchang Xing,et al.  Joint Prediction of Multiple Vulnerability Characteristics Through Multi-Task Learning , 2019, 2019 24th International Conference on Engineering of Complex Computer Systems (ICECCS).

[35]  Xiangyu Zhang,et al.  ProFuzzer: On-the-fly Input Type Probing for Better Zero-Day Vulnerability Discovery , 2019, 2019 IEEE Symposium on Security and Privacy (SP).

[36]  Jörg Schwenk,et al.  The Dangers of Key Reuse: Practical Attacks on IPsec IKE , 2018, USENIX Security Symposium.

[37]  Zhenchang Xing,et al.  Learning to Predict Severity of Software Vulnerability Using Only Vulnerability Description , 2017, 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[38]  David Lo,et al.  CC2Vec: Distributed Representations of Code Changes , 2020, 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE).

[39]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[40]  R. L. Singh,et al.  Elements of practical geography , 1979 .

[41]  Trent Jaeger,et al.  Fine-Grained Control-Flow Integrity for Kernel Software , 2016, 2016 IEEE European Symposium on Security and Privacy (EuroS&P).

[42]  Herbert Bos,et al.  Position-Independent Code Reuse: On the Effectiveness of ASLR in the Absence of Information Disclosure , 2018, 2018 IEEE European Symposium on Security and Privacy (EuroS&P).

[43]  Tao Xie,et al.  WHYPER: Towards Automating Risk Assessment of Mobile Applications , 2013, USENIX Security Symposium.

[44]  Heejo Lee,et al.  VUDDY: A Scalable Approach for Vulnerable Code Clone Discovery , 2017, 2017 IEEE Symposium on Security and Privacy (SP).

[45]  Josef Ruppenhofer,et al.  Detecting annotation noise in automatically labelled data , 2017, ACL.

[46]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[47]  Lingyun Luo,et al.  A Multi-Neural Network Fusion Based Method for Financial Event Subject Extraction , 2020, 2020 3rd International Conference on Advanced Electronic Materials, Computers and Software Engineering (AEMCSE).

[48]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.