Detecting Cyber Threats in Non-English Hacker Forums: An Adversarial Cross-Lingual Knowledge Transfer Approach

The regularity of devastating cyber-attacks has made cybersecurity a grand societal challenge. Many cybersecurity professionals are closely examining the international Dark Web to proactively pinpoint potential cyber threats. Despite its potential, the Dark Web contains hundreds of thousands of non-English posts. While machine translation is the prevailing approach to process non-English text, applying MT on hacker forum text results in mistranslations. In this study, we draw upon Long-Short Term Memory (LSTM), Cross-Lingual Knowledge Transfer (CLKT), and Generative Adversarial Networks (GANs) principles to design a novel Adversarial CLKT (A-CLKT) approach. A-CLKT operates on untranslated text to retain the original semantics of the language and leverages the collective knowledge about cyber threats across languages to create a language invariant representation without any manual feature engineering or external resources. Three experiments demonstrate how A-CLKT outperforms state-of-the-art machine learning, deep learning, and CLKT algorithms in identifying cyber-threats in French and Russian forums.

[1]  Jay F. Nunamaker,et al.  Exploring Emerging Hacker Assets and Key Hackers for Proactive Cyber Threat Intelligence , 2017, J. Manag. Inf. Syst..

[2]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[3]  Ning Zhang,et al.  Identifying, Collecting, and Presenting Hacker Community Data: Forums, IRC, Carding Shops, and DNMs , 2018, 2018 IEEE International Conference on Intelligence and Security Informatics (ISI).

[4]  Ning Zhang,et al.  Dark-Net Ecosystem Cyber-Threat Intelligence (CTI) Tool , 2019, 2019 IEEE International Conference on Intelligence and Security Informatics (ISI).

[5]  Hsinchun Chen,et al.  Detecting Cyber Threats in Non-English Dark Net Markets: A Cross-Lingual Transfer Learning Approach , 2018, 2018 IEEE International Conference on Intelligence and Security Informatics (ISI).

[6]  Mei Wang,et al.  Deep Visual Domain Adaptation: A Survey , 2018, Neurocomputing.

[7]  Graeme Hirst,et al.  Cross-Lingual Sentiment Analysis Without (Good) Translation , 2017, IJCNLP.

[8]  Vincent Lenders,et al.  BlackWidow: Monitoring the Dark Web for Cyber Security Information , 2019, 2019 11th International Conference on Cyber Conflict (CyCon).

[9]  Jay F. Nunamaker,et al.  Identifying and Profiling Key Sellers in Cyber Carding Community: AZSecure Text Mining System , 2016, J. Manag. Inf. Syst..

[10]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[11]  Linda Cloete,et al.  Dark Web: Exploring and Data Mining the Dark Side of the Web , 2012 .

[12]  Zhongfei Zhang,et al.  Structural Correspondence Learning for Cross-Lingual Sentiment Classification with One-to-Many Mappings , 2016, AAAI.

[13]  Tong Zhang,et al.  Supervised and Semi-Supervised Text Categorization using LSTM for Region Embeddings , 2016, ICML.

[14]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[15]  Ahmad Diab,et al.  Darknet and deepnet mining for proactive cybersecurity threat intelligence , 2016, 2016 IEEE Conference on Intelligence and Security Informatics (ISI).

[16]  R. Stephenson A and V , 1962, The British journal of ophthalmology.

[17]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[18]  Paulo Shakarian,et al.  DarkEmbed: Exploit Prediction With Neural Language Models , 2018, AAAI.

[19]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[20]  Hsinchun Chen,et al.  Identifying mobile malware and key threat actors in online hacker forums for proactive cyber threat intelligence , 2017, 2017 IEEE International Conference on Intelligence and Security Informatics (ISI).

[21]  Hsinchun Chen,et al.  Exploring hacker assets in underground forums , 2015, 2015 IEEE International Conference on Intelligence and Security Informatics (ISI).

[22]  Meng Zhang,et al.  Neural Network Methods for Natural Language Processing , 2017, Computational Linguistics.

[23]  Hsinchun Chen,et al.  Identifying Top Sellers In Underground Economy Using Deep Learning-Based Sentiment Analysis , 2014, 2014 IEEE Joint Intelligence and Security Informatics Conference.

[24]  W. Marsden I and J , 2012 .