Introducing a New Dataset for Event Detection in Cybersecurity Texts

Detecting cybersecurity events is necessary to keep us informed about the fast growing number of such events reported in text. In this work, we focus on the task of event detection (ED) to identify event trigger words for the cybersecurity domain. In particular, to facilitate the future research, we introduce a new dataset for this problem, characterizing the manual annotation for 30 important cybersecurity event types and a large dataset size to develop deep learning models. Comparing to the prior datasets for this task, our dataset involves more event types and supports the modeling of document-level information to improve the performance. We perform extensive evaluation with the current state-of-the-art methods for ED on the proposed dataset. Our experiments reveal the challenges of cybersecurity ED and present many research opportunities in this area for the future work.

[1]  Sampo Pyysalo,et al.  Overview of BioNLP’09 Shared Task on Event Extraction , 2009, BioNLP@HLT-NAACL.

[2]  Yorick Wilks,et al.  Cyberattack Prediction Through Public Text Analysis and Mini-Theories , 2018, 2018 IEEE International Conference on Big Data (Big Data).

[3]  David Ahn,et al.  The stages of event extraction , 2006 .

[4]  Timothy W. Finin,et al.  CyberTwitter: Using Twitter to generate alerts for cybersecurity threats and vulnerabilities , 2016, 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[5]  Tao Xie,et al.  WHYPER: Towards Automating Risk Assessment of Mobile Applications , 2013, USENIX Security Symposium.

[6]  Ralph Grishman,et al.  Using Document Level Cross-Event Inference to Improve Event Extraction , 2010, ACL.

[7]  Gang Wang,et al.  Crowdsourcing Cybersecurity: Cyber Attack Detection using Social Media , 2017, CIKM.

[8]  Jun Zhao,et al.  Exploiting Argument Information to Improve Event Detection via Supervised Attention Mechanisms , 2017, ACL.

[9]  Jun Zhao,et al.  Leveraging FrameNet to Improve Automatic Event Detection , 2016, ACL.

[10]  Yue Zhang,et al.  DDoS Event Forecasting using Twitter Data , 2017, IJCAI.

[11]  Nathanael Chambers,et al.  Detecting Denial-of-Service Attacks from Social Media Text: Applying NLP to Computer Security , 2018, NAACL.

[12]  Ralph Grishman,et al.  Event Detection and Domain Adaptation with Convolutional Neural Networks , 2015, ACL.

[13]  Guodong Zhou,et al.  Self-regulation: Employing a Generative Adversarial Network to Improve Event Detection , 2018, ACL.

[14]  Jun Zhao,et al.  Collective Event Detection via a Hierarchical and Bias Tagging Networks with Gated Multi-level Attention Mechanisms , 2018, EMNLP.

[15]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[16]  Shimei Pan,et al.  Predicting Malware Attributes from Cybersecurity Texts , 2019, NAACL-HLT.

[17]  Francis Ferraro,et al.  Extracting Rich Semantic Information about Cybersecurity Events , 2019, 2019 IEEE International Conference on Big Data (Big Data).

[18]  Ninghui Li,et al.  Using probabilistic generative models for ranking risks of Android apps , 2012, CCS.

[19]  Ralph Grishman,et al.  Modeling Skip-Grams for Event Detection with Convolutional Neural Networks , 2016, EMNLP.

[20]  Dongsheng Li,et al.  Exploring Pre-trained Language Models for Event Extraction and Generation , 2019, ACL.

[21]  Heng Ji,et al.  Joint Event Extraction via Structured Prediction with Global Features , 2013, ACL.

[22]  Wei Lu,et al.  SemEval-2018 Task 8: Semantic Extraction from CybersecUrity REports using Natural Language Processing (SecureNLP) , 2018, *SEMEVAL.

[23]  Xiang Zhang,et al.  Automatically Labeled Data Generation for Large Scale Event Extraction , 2017, ACL.

[24]  Ralph Grishman,et al.  Joint Event Extraction via Recurrent Neural Networks , 2016, NAACL.

[25]  Qishi Wu,et al.  AVOIDIT: A Cyber Attack Taxonomy , 2009 .

[26]  Thien Huu Nguyen,et al.  Event Detection: Gate Diversity and Syntactic Importance Scores for Graph Convolution Neural Networks , 2020, EMNLP.

[27]  Thien Huu Nguyen,et al.  One for All: Neural Joint Modeling of Entities and Events , 2018, AAAI.

[28]  Rasim M. Alguliyev,et al.  The Improved LSTM and CNN Models for DDoS Attacks Prediction in Social Media , 2019, Int. J. Cyber Warf. Terror..

[29]  David Bamman,et al.  Literary Event Detection , 2019, ACL.

[30]  Xu Han,et al.  Adversarial Training for Weakly Supervised Event Detection , 2019, NAACL.

[31]  Ralph Grishman,et al.  Graph Convolutional Networks With Argument-Aware Pooling for Event Detection , 2018, AAAI.

[32]  Teruko Mitamura,et al.  Overview of TAC KBP 2015 Event Nugget Track , 2015, TAC.

[33]  Mehmet A. Orgun,et al.  Real-time event detection from the Twitter data stream using the TwitterNews+ Framework , 2019, Inf. Process. Manag..

[34]  Likun Qiu,et al.  Feature Representation Models for Cyber Attack Event Extraction , 2016, 2016 IEEE/WIC/ACM International Conference on Web Intelligence Workshops (WIW).

[35]  Heng Ji,et al.  Refining Event Extraction through Cross-Document Inference , 2008, ACL.

[36]  Wei Lu,et al.  MalwareTextDB: A Database for Annotated Malware Articles , 2017, ACL.

[37]  Jun Zhao,et al.  Event Extraction via Dynamic Multi-Pooling Convolutional Neural Networks , 2015, ACL.

[38]  Thien Huu Nguyen,et al.  Similar but not the Same: Word Sense Disambiguation Improves Event Detection via Neural Representation Matching , 2018, EMNLP.

[39]  Lifu Huang,et al.  Zero-Shot Transfer Learning for Event Extraction , 2017, ACL.

[40]  Ralph Grishman,et al.  New York University 2016 System for KBP Event Nugget: A Deep Learning Approach , 2016, TAC.

[41]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[42]  Ralph Grishman,et al.  A Two-stage Approach for Extending Event Detection to New Types via Neural Networks , 2016, Rep4NLP@ACL.

[43]  Franck Dernoncourt,et al.  Extensively Matching for Few-shot Learning Event Detection , 2020, NUSE.

[44]  Yue Zhao,et al.  Document Embedding Enhanced Event Detection with Hierarchical and Supervised Attention , 2018, ACL.

[45]  Tudor Dumitras,et al.  FeatureSmith: Automatically Engineering Features for Malware Detection by Mining the Security Literature , 2016, CCS.

[46]  Mourad Debbabi,et al.  SONAR: Automatic Detection of Cyber Security Events over the Twitter Stream , 2017, ARES.

[47]  Francis Ferraro,et al.  CASIE: Extracting Cybersecurity Event Information from Text , 2020, AAAI.

[48]  Haoran Yan,et al.  Event Detection with Multi-Order Graph Convolution and Aggregated Attention , 2019, EMNLP.

[49]  Thien Huu Nguyen,et al.  Extending Event Detection to New Types with Learning from Keywords , 2019, EMNLP.