Multi-Platform Authorship Verification

At the present time, there has been a rapid increase in the variety and popularity of messaging systems such as social network messaging, text messages, email and Twitter, with users frequently exchanging messages across various platforms. Unfortunately, in amongst the legitimate messages, there is a host of illegitimate and inappropriate content - with cyber stalking, trolling and computerassisted crime all taking place. Therefore, there is a need to identify individuals using messaging systems. Stylometry is the study of linguistic features in a text which consists of verifying an author based on his writing style that consists of checking whether a target text was written or not by a specific individual author. Whilst much research has taken place within authorship verification, studies have focused upon singular platforms, often had limited datasets and restricted methodologies that have meant it is difficult to appreciate the real-world value of the approach. This paper seeks to overcome these limitations through providing an analysis of authorship verification across four common messaging systems. This approach enables a direct comparison of recognition performance and provides a basis for analyzing the feature vectors across platforms to better understand what aspects each capitalize upon in order to achieve good classification. The experiments also include an investigation into the feature vector creation, utilizing population and user-based techniques to compare and contrast performance. The experiment involved 50 participants across four common platforms with a total 13,617; 106,359; 4,539; and 6,540 samples for Twitter, SMS, Facebook, and Email achieving an Equal Error Rate (EER) of 20.16%, 7.97%, 25% and 13.11% respectively.

[1]  Hsinchun Chen,et al.  A framework for authorship identification of online messages: Writing-style features and classification techniques , 2006 .

[2]  Shlomo Argamon,et al.  Authorship Attribution: What's Easy and What's Hard? , 2013 .

[3]  R. Weisberg A-N-D , 2011 .

[4]  Hsinchun Chen,et al.  Applying authorship analysis to extremist-group Web forum messages , 2005, IEEE Intelligent Systems.

[5]  Luís Torgo,et al.  Data Mining with R: Learning with Case Studies , 2010 .

[6]  Frederick Mosteller,et al.  Applied Bayesian and classical inference : the case of the Federalist papers , 1984 .

[7]  Mohammad S. Obaidat,et al.  Authorship verification using deep belief network systems , 2017, Int. J. Commun. Syst..

[8]  Dipankar Das,et al.  Authorship Verification: An Approach based on Random Forest: Notebook for PAN at CLEF 2015 , 2015, CLEF.

[9]  Isaac Woungang,et al.  Verifying Online User Identity using Stylometric Analysis for Short Messages , 2014, J. Networks.

[10]  V. M. Thakre,et al.  Analysis of online messages for identity tracing in cybercrime investigation , 2012, Proceedings Title: 2012 International Conference on Cyber Security, Cyber Warfare and Digital Forensic (CyberSec).

[11]  Shlomo Argamon,et al.  Style mining of electronic messages for multiple authorship discrimination: first results , 2003, KDD '03.

[12]  Steven Furnell,et al.  SMS linguistic profiling authentication on mobile device , 2011, 2011 5th International Conference on Network and System Security.

[13]  Benjamin C. M. Fung,et al.  e-mail authorship verification for forensic investigation , 2010, SAC '10.

[14]  Gianluca Stringhini,et al.  Detecting spammers on social networks , 2010, ACSAC '10.

[15]  Gene Tsudik,et al.  Trilateral Large-Scale OSN Account Linkability Study , 2016, AAAI Fall Symposia.

[16]  Gregory J. L. Tourte,et al.  Twitter, information sharing and the London riots? , 2012 .

[17]  Dale Schuurmans,et al.  Language independent authorship attribution using character level language models , 2003, Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - EACL '03.

[18]  Pu-Jen Cheng,et al.  Person Identification between Different Online Social Networks , 2014, 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT).

[19]  Dmitry V. Khmelev,et al.  Using Markov Chains for Identification of Writer , 2001, Lit. Linguistic Comput..

[20]  John V. Monaco,et al.  Authorship Authentication Using Short Messages from Social Networking Sites , 2014, 2014 IEEE 11th International Conference on e-Business Engineering.

[21]  Hsinchun Chen,et al.  Writeprints: A stylometric approach to identity-level identification and similarity detection in cyberspace , 2008, TOIS.

[22]  Rachel Greenstadt,et al.  Adversarial stylometry: Circumventing authorship recognition to preserve privacy and anonymity , 2012, TSEC.

[23]  Andrew B. Whinston,et al.  Social Computing: An Overview , 2007, Commun. Assoc. Inf. Syst..