Authorship Analysis on Dark Marketplace Forums

Anonymity networks like Tor harbor many underground markets and discussion forums dedicated to the trade of illegal goods and services. As they are gaining in popularity, the analysis of their content and users is becoming increasingly urgent for many different parties, ranging from law enforcement and security agencies to financial institutions. A major issue in cyber forensics is that anonymization techniques like Tor's onion routing have made it very difficult to trace the identities of suspects. In this paper we propose classification set-ups for two tasks related to user identification, namely alias classification and authorship attribution. We apply our techniques to data from a Tor discussion forum mainly dedicated to drug trafficking, and show that for both tasks we achieve high accuracy using a combination of character-level n-grams, stylometric features and timestamp features of the user posts.

[1]  Jasmine Novak,et al.  Anti-aliasing on the web , 2004, WWW '04.

[2]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[3]  F. Mosteller,et al.  A comparative study of discrimination methods applied to the authorship of the disputed Federalist papers , 2016 .

[4]  Sebastián A. Ríos,et al.  Topic-based social network analysis for virtual communities of interests in the Dark Web , 2010, ISI-KDD '10.

[5]  Rachel Greenstadt,et al.  Adversarial stylometry: Circumventing authorship recognition to preserve privacy and anonymity , 2012, TSEC.

[6]  Sebastián A. Ríos,et al.  Dark Web portal overlapping community detection based on topic models , 2012, ISI-KDD '12.

[7]  Hsinchun Chen,et al.  Writeprints: A stylometric approach to identity-level identification and similarity detection in cyberspace , 2008, TOIS.

[8]  Dale Schuurmans,et al.  Language independent authorship attribution using character level language models , 2003, Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - EACL '03.

[9]  Walter Daelemans,et al.  Authorship Attribution and Verification with Many Authors and Limited Data , 2008, COLING.

[10]  Paul Rayson,et al.  A Service-Indepenent Model for Linking Online User Profile Information , 2014, 2014 IEEE Joint Intelligence and Security Informatics Conference.

[11]  F. Mosteller,et al.  Inference in an Authorship Problem , 1963 .

[12]  Fredrik Johansson,et al.  Detecting multiple aliases in social media , 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[13]  Mark van Staalduinen,et al.  Towards a Comprehensive Insight into the Thematic Organization of the Tor Hidden Services , 2014, 2014 IEEE Joint Intelligence and Security Informatics Conference.

[14]  Christos Faloutsos,et al.  Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining , 2013, ASONAM 2013.

[15]  Hsinchun Chen,et al.  Visualizing Authorship for Identification , 2006, ISI.

[16]  Hsinchun Chen,et al.  A framework for authorship identification of online messages: Writing-style features and classification techniques , 2006 .

[17]  Efstathios Stamatatos,et al.  A survey of modern authorship attribution methods , 2009, J. Assoc. Inf. Sci. Technol..

[18]  Paul F. Syverson,et al.  Onion routing , 1999, CACM.

[19]  Shlomo Argamon,et al.  Style mining of electronic messages for multiple authorship discrimination: first results , 2003, KDD '03.

[20]  Olivier de Vel,et al.  Mining E-mail Authorship , 2000 .

[21]  Fredrik Johansson,et al.  Time Profiles for Identifying Users in Online Environments , 2014, 2014 IEEE Joint Intelligence and Security Informatics Conference.

[22]  Pankaj Rohatgi,et al.  Can Pseudonymity Really Guarantee Privacy? , 2000, USENIX Security Symposium.

[23]  Efstathios Stamatatos,et al.  Author identification: Using text sampling to handle the class imbalance problem , 2008, Inf. Process. Manag..

[24]  Ying Li,et al.  E-mail authorship mining based on SVM for computer forensic , 2004, Proceedings of 2004 International Conference on Machine Learning and Cybernetics (IEEE Cat. No.04EX826).

[25]  Rachel Greenstadt,et al.  Detecting Hoaxes, Frauds, and Deception in Writing Style Online , 2012, 2012 IEEE Symposium on Security and Privacy.

[26]  Simon Günter,et al.  Short Text Authorship Attribution via Sequence Kernels, Markov Chains and Author Unmasking: An Investigation , 2006, EMNLP.

[27]  Richard Dazeley,et al.  Authorship Attribution for Twitter in 140 Characters or Less , 2010, 2010 Second Cybercrime and Trustworthy Computing Workshop.

[28]  Fuchun Peng,et al.  N-GRAM-BASED AUTHOR PROFILES FOR AUTHORSHIP ATTRIBUTION , 2003 .

[29]  Zhan Bu,et al.  A sock puppet detection algorithm on virtual spaces , 2013, Knowl. Based Syst..

[30]  Hsinchun Chen,et al.  Applying authorship analysis to extremist-group Web forum messages , 2005, IEEE Intelligent Systems.

[31]  Jörg Kindermann,et al.  Authorship Attribution with Support Vector Machines , 2003, Applied Intelligence.

[32]  Dawn Xiaodong Song,et al.  On the Feasibility of Internet-Scale Author Identification , 2012, 2012 IEEE Symposium on Security and Privacy.