An Argument for Linguistic Expertise in Cyberthreat Analysis: LOLSec in Russian Language eCrime Landscape

In this position paper, we argue for a holistic perspective on threat analysis and other studies of state-sponsored or state-aligned eCrime groups. Specifically, we argue that understanding eCrime requires approaching it as a sociotechnical system and that studying such a system requires combining linguistic, regional, professional, and technical expertise. To illustrate it, we focus on the discourse of the Conti ransomware group in the context of the Russian invasion of Ukraine. We discuss the background of this group and their actions and argue that the technical approach alone can lose the important aspects specific to the cultural and linguistic context, such as language, slang and humor. We provide examples of how the discourse and threats from such groups can be easily misunderstood without appropriate linguistic and domain expertise.

[1]  Damon McCoy,et al.  Money Over Morals: A Business Analysis of Conti Ransomware , 2022, 2022 APWG Symposium on Electronic Crime Research (eCrime).

[2]  B. Kostadinov,et al.  Using Data Science Tools for Investigating Chat Logs from the Conti Ransomware Group , 2022, 2022 IEEE 13th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON).

[3]  Yejin Choi,et al.  Is Reinforcement Learning (Not) for Natural Language Processing?: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization , 2022, ArXiv.

[4]  Timothy W. Finin,et al.  CyBERT: Contextualized Embeddings for the Cybersecurity Domain , 2021, 2021 IEEE International Conference on Big Data (Big Data).

[5]  M. Falch,et al.  A qualitative mapping of Darkweb marketplaces , 2021, 2021 APWG Symposium on Electronic Crime Research (eCrime).

[6]  A. Rashid,et al.  Tokyo, Denver, Helsinki, Lisbon or the Professor? A Framework for Understanding Cybercriminal Roles in Darknet Markets , 2021, 2021 APWG Symposium on Electronic Crime Research (eCrime).

[7]  Cheng Huang,et al.  HackerRank: Identifying key hackers in underground forums , 2021, Int. J. Distributed Sens. Networks.

[8]  C. Weber,et al.  Survey on reinforcement learning for language processing , 2021, Artificial Intelligence Review.

[9]  L. Jean Camp,et al.  Using bursty announcements for detecting BGP routing anomalies , 2021, Comput. Networks.

[10]  ChengXiang Zhai,et al.  Towards Dark Jargon Interpretation in Underground Forums , 2020, ECIR.

[11]  Zohar Kampf,et al.  SRSLY?? A typology of online ironic markers , 2020, Information, Communication & Society.

[12]  Ildiko Pete,et al.  A Social Network Analysis and Comparison of Six Dark Web Forums , 2020, 2020 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW).

[13]  Hsinchun Chen,et al.  Proactively Identifying Emerging Hacker Threats from the Dark Web , 2020, ACM Trans. Priv. Secur..

[14]  Susan Mckeever,et al.  Moving Targets: Addressing Concept Drift in Supervised Models for Hacker Communication Detection , 2020, 2020 International Conference on Cyber Security and Protection of Digital Services (Cyber Security).

[15]  Sagar Samtani,et al.  Detecting Cyber Threats in Non-English Hacker Forums: An Adversarial Cross-Lingual Knowledge Transfer Approach , 2020, 2020 IEEE Security and Privacy Workshops (SPW).

[16]  Patrice Y. Simard,et al.  Interactive machine teaching: a human-centered approach to building machine-learned models , 2020, Hum. Comput. Interact..

[17]  Georgios Siolas,et al.  A transformer-based approach to irony and sarcasm detection , 2019, Neural Computing and Applications.

[18]  Vincent Lenders,et al.  BlackWidow: Monitoring the Dark Web for Cyber Security Information , 2019, 2019 11th International Conference on Cyber Conflict (CyCon).

[19]  Louis-Philippe Morency,et al.  UR-FUNNY: A Multimodal Language Dataset for Understanding Humor , 2019, EMNLP.

[20]  Paula Buttery,et al.  Characterizing Eve: Analysing Cybercrime Actors in a Large Underground Forum , 2018, RAID.

[21]  Jonathan Lusthaus Honour Among (Cyber)thieves? , 2018, European Journal of Sociology.

[22]  Paulo Shakarian,et al.  At-risk system identification via analysis of discussions on the darkweb , 2018, 2018 APWG Symposium on Electronic Crime Research (eCrime).

[23]  Mohan S. Kankanhalli,et al.  Trends and Trajectories for Explainable, Accountable and Intelligible Systems: An HCI Research Agenda , 2018, CHI.

[24]  L. Jean Camp,et al.  Incompetents, criminals, or spies: Macroeconomic analysis of routing anomalies , 2017, Comput. Secur..

[25]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[26]  Maria Shardakova Politeness, Teasing, and Humor , 2017 .

[27]  Hsinchun Chen,et al.  Exploring the online underground marketplaces through topic-based social network and clustering , 2016, 2016 IEEE Conference on Intelligence and Security Informatics (ISI).

[28]  Byron C. Wallace,et al.  Modelling Context with User Embeddings for Sarcasm Detection in Social Media , 2016, CoNLL.

[29]  Tony Veale,et al.  Fracking Sarcasm using Neural Network , 2016, WASSA@NAACL-HLT.

[30]  Richard Clayton,et al.  Exploring the Provision of Online Booter Services , 2016 .

[31]  Pushpak Bhattacharyya,et al.  Automatic Sarcasm Detection: A Survey , 2016 .

[32]  Ariel Stolerman,et al.  Doppelgänger Finder: Taking Stylometry to the Underground , 2014, 2014 IEEE Symposium on Security and Privacy.

[33]  T. Holt Examining the Forces Shaping Cybercrime Markets Online , 2013 .

[34]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[35]  Inna Kouper,et al.  The Pragmatics of Peer Advice in a LiveJournal Community , 2010 .

[36]  Fredrik Olsson,et al.  A literature survey of active machine learning in the context of natural language processing , 2009 .

[37]  S. Hochreiter,et al.  Long Short-Term Memory , 1997, Neural Computation.

[38]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[39]  Yang Xiao,et al.  An Analysis of Conti Ransomware Leaked Source Codes , 2022, IEEE Access.

[40]  Kim Breitwieser Can Contextualizing User Embeddings Improve Sarcasm and Hate Speech Detection? , 2022, NLPCSS.

[41]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[42]  Stephen McCombie,et al.  Cybercrime Attribution: An Eastern European Case Study , 2009 .