TTPDrill: Automatic and Accurate Extraction of Threat Actions from Unstructured Text of CTI Sources

With the rapid growth of the cyber attacks, sharing of cyber threat intelligence (CTI) becomes essential to identify and respond to cyber attack in timely and cost-effective manner. However, with the lack of standard languages and automated analytics of cyber threat information, analyzing complex and unstructured text of CTI reports is extremely time- and labor-consuming. Without addressing this challenge, CTI sharing will be highly impractical, and attack uncertainty and time-to-defend will continue to increase. Considering the high volume and speed of CTI sharing, our aim in this paper is to develop automated and context-aware analytics of cyber threat intelligence to accurately learn attack pattern (TTPs) from commonly available CTI sources in order to timely implement cyber defense actions. Our paper has three key contributions. First, it presents a novel threat-action ontology that is sufficiently rich to understand the specifications and context of malicious actions. Second, we developed a novel text mining approach that combines enhanced techniques of Natural Language Processing (NLP) and Information retrieval (IR) to extract threat actions based on semantic (rather than syntactic) relationship. Third, our CTI analysis can construct a complete attack pattern by mapping each threat action to the appropriate techniques, tactics and kill chain phases, and translating it any threat sharing standards, such as STIX 2.1. Our CTI analytic techniques were implemented in a tool, called TTPDrill, and evaluated using a randomly selected set of Symantec Threat Reports. Our evaluation tests show that TTPDrill achieves more than 82% of precision and recall in a variety of measures, very reasonable for this problem domain.

[1]  Tudor Dumitras,et al.  Vulnerability Disclosure in the Age of Social Media: Exploiting Twitter for Predicting Real-World Exploits , 2015, USENIX Security Symposium.

[2]  Steven M. Bellovin,et al.  Privee: An Architecture for Automatically Analyzing Web Privacy Policies , 2014, USENIX Security Symposium.

[3]  Penelope Sibun,et al.  A Practical Part-of-Speech Tagger , 1992, ANLP.

[4]  Stephen E. Robertson,et al.  Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval , 1994, SIGIR '94.

[5]  D M Faissol,et al.  Taxonomies of Cyber Adversaries and Attacks: A Survey of Incidents and Approaches , 2009 .

[6]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[7]  Josef van Genabith,et al.  Coling 2008: Proceedings of the workshop on Cross-Framework and Cross-Domain Parser Evaluation , 2008, COLING 2008.

[8]  Tao Xie,et al.  WHYPER: Towards Automating Risk Assessment of Mobile Applications , 2013, USENIX Security Symposium.

[9]  Christopher D. Manning,et al.  The Stanford Typed Dependencies Representation , 2008, CF+CDPE@COLING.

[10]  Zhong Chen,et al.  AutoCog: Measuring the Description-to-permission Fidelity in Android Applications , 2014, CCS.

[11]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[12]  Eric Michael Hutchins,et al.  Intelligence-Driven Computer Network Defense Informed by Analysis of Adversary Campaigns and Intrusion Kill Chains , 2010 .

[13]  N. F. Noy,et al.  Ontology Development 101: A Guide to Creating Your First Ontology , 2001 .

[14]  Ronald D. Williams,et al.  Taxonomies of attacks and vulnerabilities in computer systems , 2008, IEEE Communications Surveys & Tutorials.

[15]  Tudor Dumitras,et al.  FeatureSmith: Automatically Engineering Features for Malware Detection by Mining the Security Literature , 2016, CCS.

[16]  Zhou Li,et al.  Acing the IOC Game: Toward Automatic Discovery and Analysis of Open-Source Cyber Threat Intelligence , 2016, CCS.

[17]  Leo Obrst,et al.  Developing an Ontology of the Cyber Security Domain , 2012, STIDS.