HinCTI: A Cyber Threat Intelligence Modeling and Identification System Based on Heterogeneous Information Network

A rising number of organizations are showing a growing willingness to leverage cyber threat intelligence (CTI) for obtaining a full picture of cyber threat situation. Owing to the limited labels of cyber threat infrastructure nodes involved in CTI, automatically identifying the threat type of infrastructure nodes for early warning is also challenging. To tackle these challenges, a practical system called HinCTI is developed for modeling cyber threat intelligence and identifying threat types. We first design a threat intelligence meta-schema to depict the semantic relatedness of infrastructure nodes. We then model CTI on heterogeneous information network (HIN). Following, we define a meta-path and meta-graph instances-based threat Infrastructure similarity (MIIS) measure between threat infrastructure nodes and present a MIIS measure-based heterogeneous graph convolutional network (GCN) approach to identify the threat types of infrastructure nodes involved in CTI. To the best of our knowledge, this work is the first to model CTI on HIN for threat identification and propose a heterogeneous GCN-based approach for threat type identification of infrastructure nodes. With HinCTI, comprehensive experiments are conducted on real-world datasets, and experimental results demonstrate that our proposed approach can significantly improve the performance of threat type identification compared to the existing state-of-the-art baseline methods.

[1]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[2]  Bernd Grobauer,et al.  Mining Attributed Graphs for Threat Intelligence , 2017, CODASPY.

[3]  Fei Wang,et al.  HERCULE: attack story reconstruction via community discovery on correlated log graph , 2016, ACSAC.

[4]  Ahmed E. Hassan,et al.  A survey on the use of topic models when mining software repositories , 2015, Empirical Software Engineering.

[5]  Yanfang Ye,et al.  Gotcha - Sly Malware!: Scorpion A Metagraph2vec Based Malware Detection System , 2018, KDD.

[6]  Michael D. Iannacone,et al.  Developing an Ontology for Cyber Security Knowledge Graphs , 2015, CISR.

[7]  Lawrence K. Saul,et al.  Beyond blacklists: learning to detect malicious web sites from suspicious URLs , 2009, KDD.

[8]  Adam Doupé,et al.  Towards Automated Threat Intelligence Fusion , 2016, 2016 IEEE 2nd International Conference on Collaboration and Internet Computing (CIC).

[9]  Asad Waqar Malik,et al.  A machine learning framework for investigating data breaches based on semantic analysis of adversary's attack patterns in threat intelligence repositories , 2019, Future Gener. Comput. Syst..

[10]  Jiawei Han,et al.  Ranking-based classification of heterogeneous information networks , 2011, KDD.

[11]  Wiem Tounsi,et al.  A survey on technical threat intelligence in the age of sophisticated cyber attacks , 2018, Comput. Secur..

[12]  Yuri Demchenko,et al.  The Incident Object Description Exchange Format , 2007, RFC.

[13]  Günther Pernul,et al.  Graph-based visual analytics for cyber threat intelligence , 2018, Cybersecurity.

[14]  Yanfang Ye,et al.  HinDroid: An Intelligent Android Malware Detection System Based on Structured Heterogeneous Information Network , 2017, KDD.

[15]  Sandeep Yadav,et al.  Detecting Algorithmically Generated Domain-Flux Attacks With DNS Traffic Analysis , 2012, IEEE/ACM Transactions on Networking.

[16]  Jirui Li,et al.  Graph Mining-based Trust Evaluation Mechanism with Multidimensional Features for Large-scale Heterogeneous Threat Intelligence , 2018, 2018 IEEE International Conference on Big Data (Big Data).

[17]  Hyun Ah Song,et al.  FRAUDAR: Bounding Graph Fraud in the Face of Camouflage , 2016, KDD.

[18]  Jianxin Li,et al.  Large-Scale Hierarchical Text Classification with Recursively Regularized Deep Graph-CNN , 2018, WWW.

[19]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[20]  Arvind Mallari Rao,et al.  Technical Aspects of Cyber Kill Chain , 2015, SSCC.

[21]  Sagar Samtani,et al.  Cybersecurity as an Industry: A Cyber Threat Intelligence Perspective , 2020, The Palgrave Handbook of International Cybercrime and Cyberdeviance.

[22]  Yiming Yang,et al.  Recursive regularization for large-scale classification with hierarchical and graphical dependencies , 2013, KDD.

[23]  Ruth Breu,et al.  Data Quality Challenges and Future Research Directions in Threat Intelligence Sharing Practice , 2016, WISCS@CCS.

[24]  Yong Shi,et al.  Malicious Domain Name Detection Based on Extreme Machine Learning , 2017, Neural Processing Letters.

[25]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[26]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[27]  Zhou Li,et al.  Acing the IOC Game: Toward Automatic Discovery and Analysis of Open-Source Cyber Threat Intelligence , 2016, CCS.

[28]  Yan Jia,et al.  A Practical Approach to Constructing a Knowledge Graph for Cybersecurity , 2018 .

[29]  Ehab Al-Shaer,et al.  TTPDrill: Automatic and Accurate Extraction of Threat Actions from Unstructured Text of CTI Sources , 2017, ACSAC.

[30]  P. Schrimpf,et al.  Dynamic Programming , 2011 .

[31]  Yanfang Ye,et al.  Heterogeneous Graph Attention Network , 2019, WWW.

[32]  F. Scarselli,et al.  A new model for learning in graph domains , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[33]  Philip S. Yu,et al.  Leveraging Meta-path based Context for Top- N Recommendation with A Neural Co-Attention Model , 2018, KDD.

[34]  Nitesh V. Chawla,et al.  metapath2vec: Scalable Representation Learning for Heterogeneous Networks , 2017, KDD.

[35]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[36]  Dik Lun Lee,et al.  Meta-Graph Based Recommendation Fusion over Heterogeneous Information Networks , 2017, KDD.

[37]  Jaap Kamps,et al.  HiTR: Hierarchical Topic Model Re-Estimation for Measuring Topical Diversity of Documents , 2018, IEEE Transactions on Knowledge and Data Engineering.

[38]  Ah Chung Tsoi,et al.  The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.

[39]  Philip S. Yu,et al.  Multi-label classification by mining label and instance correlations from heterogeneous information networks , 2013, KDD.

[40]  Mourad Debbabi,et al.  Graph-theoretic characterization of cyber-threat infrastructures , 2015, Digit. Investig..

[41]  Philip S. Yu,et al.  A Survey of Heterogeneous Information Network Analysis , 2015, IEEE Transactions on Knowledge and Data Engineering.

[42]  Sandeep Yadav,et al.  Detecting Malicious Domains via Graph Inference , 2014, AISec '14.

[43]  Hsinchun Chen,et al.  Exploring hacker assets in underground forums , 2015, 2015 IEEE International Conference on Intelligence and Security Informatics (ISI).

[44]  Paul Rimba,et al.  Data-Driven Cybersecurity Incident Prediction: A Survey , 2019, IEEE Communications Surveys & Tutorials.

[45]  Roberto Perdisci,et al.  From Throw-Away Traffic to Bots: Detecting the Rise of DGA-Based Malware , 2012, USENIX Security Symposium.

[46]  Wang-Chien Lee,et al.  HIN2Vec: Explore Meta-paths in Heterogeneous Information Networks for Representation Learning , 2017, CIKM.

[47]  Junhyung Park,et al.  Managing Cyber Threat Intelligence in a Graph Database: Methods of Analyzing Intrusion Sets, Threat Actors, and Campaigns , 2018, 2018 International Conference on Platform Technology and Service (PlatCon).

[48]  Jay F. Nunamaker,et al.  Exploring Emerging Hacker Assets and Key Hackers for Proactive Cyber Threat Intelligence , 2017, J. Manag. Inf. Syst..

[49]  Hsinchun Chen,et al.  AZSecure Hacker Assets Portal: Cyber threat intelligence and malware analysis , 2016, 2016 IEEE Conference on Intelligence and Security Informatics (ISI).

[50]  Wenke Lee,et al.  Detecting Malware Domains at the Upper DNS Hierarchy , 2011, USENIX Security Symposium.