Timeprints for identifying social media users with multiple aliases

Many people who discuss sensitive or private issues on social media services are using pseudonyms or aliases in order to not reveal their true identity, while using their usual, non-private accounts when posting messages on less sensitive issues. Previous research has shown that if those individuals post large amounts of user-generated content, stylometric techniques can be used to identify the author based on the characteristics of the textual content. In this article we show how an author’s identity can be unmasked in a similar way using various time features (e.g., period of the day and the day of the week when a user’s posts have been published). We combine several different time features into a timeprint, which can be seen as a type of fingerprint when identifying users on social media. We use supervised machine learning (i.e., author identification) and unsupervised alias matching (similarity detection) in a number of different experiments with forum data to get an understanding of to what extent timeprints can be used for identifying users in social media, both in isolation and when combined with stylometric features. The obtained results show that timeprints indeed can be a very powerful tool for both author identification and alias matching in social media.

[1]  Duncan J. Watts,et al.  Characterizing individual communication patterns , 2009, KDD.

[2]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[3]  Dawn Xiaodong Song,et al.  On the Feasibility of Internet-Scale Author Identification , 2012, 2012 IEEE Symposium on Security and Privacy.

[4]  Fredrik Johansson,et al.  Analysis of Weak Signals for Detecting Lone Wolf Terrorists , 2012, EISIC.

[5]  Hsinchun Chen,et al.  Writeprints: A stylometric approach to identity-level identification and similarity detection in cyberspace , 2008, TOIS.

[6]  Jasmine Novak,et al.  Anti-aliasing on the web , 2004, WWW '04.

[7]  Bradley Malin,et al.  Email alias detection using social network analysis , 2005, LinkKDD '05.

[8]  Patrick Juola,et al.  Authorship Attribution , 2008, Found. Trends Inf. Retr..

[9]  C. Randler,et al.  Circadian Typology: A Comprehensive Review , 2012, Chronobiology international.

[10]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[11]  Elad Yom-Tov,et al.  Serial Sharers: Detecting Split Identities of Web Authors , 2007, PAN.

[12]  Fredrik Johansson,et al.  Detecting multiple aliases in social media , 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[13]  Fredrik Johansson,et al.  Time Profiles for Identifying Users in Online Environments , 2014, 2014 IEEE Joint Intelligence and Security Informatics Conference.

[14]  Gary Geunbae Lee,et al.  Information gain and divergence-based feature selection for machine learning-based text categorization , 2006, Inf. Process. Manag..

[15]  Andrea L. Bertozzi,et al.  An “Estimate & Score Algorithm” for simultaneous parameter estimation and reconstruction of incomplete data on social networks , 2013, Security Informatics.

[16]  Paul F. Syverson,et al.  Onion routing , 1999, CACM.

[17]  Niklas Zechner Effects of Division of Data in Author Identification , 2014 .

[18]  J. Horne,et al.  A self-assessment questionnaire to determine morningness-eveningness in human circadian rhythms. , 1976, International journal of chronobiology.

[19]  Bumsuk Lee A Temporal Analysis of Posting Behavior in Social Media Streams , 2012, ICWSM 2012.

[20]  Reza Zafarani,et al.  Connecting Corresponding Identities across Communities , 2009, ICWSM.

[21]  Claude Castelluccia,et al.  How Unique and Traceable Are Usernames? , 2011, PETS.

[22]  Róbert Urbán,et al.  Morningness-Eveningness, Chronotypes and Health-Impairing Behaviors in Adolescents , 2011, Chronobiology international.

[23]  Richard Dazeley,et al.  Authorship Attribution for Twitter in 140 Characters or Less , 2010, 2010 Second Cybercrime and Trustworthy Computing Workshop.

[24]  Johan Dahlin,et al.  Combining Entity Matching Techniques for Detecting Extremist Behavior on Discussion Boards , 2012, 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining.

[25]  Ahmed Abbasi,et al.  Affect Intensity Analysis of Dark Web Forums , 2007, 2007 IEEE Intelligence and Security Informatics.

[26]  Ramayya Krishnan,et al.  HYDRA: large-scale social identity linkage via heterogeneous behavior modeling , 2014, SIGMOD Conference.

[27]  Reza Zafarani,et al.  Connecting users across social media sites: a behavioral-modeling approach , 2013, KDD.

[28]  Michael Gamon,et al.  Obfuscating Document Stylometry to Preserve Author Anonymity , 2006, ACL.

[29]  Fredrik Johansson,et al.  Harvesting and analysis of weak signals for detecting lone wolf terrorists , 2012, 2012 European Intelligence and Security Informatics Conference.

[30]  Rachel Greenstadt,et al.  Adversarial stylometry: Circumventing authorship recognition to preserve privacy and anonymity , 2012, TSEC.

[31]  Efstathios Stamatatos,et al.  A survey of modern authorship attribution methods , 2009, J. Assoc. Inf. Sci. Technol..

[32]  A. Adan,et al.  Chronotype and personality factors in the daily consumption of alcohol and psychostimulants. , 1994, Addiction.

[33]  Gene Tsudik,et al.  Fighting authorship linkability with crowdsourcing , 2014, COSN '14.

[34]  Hsinchun Chen,et al.  A framework for authorship identification of online messages: Writing-style features and classification techniques , 2006 .