Conversationally-inspired stylometric features for authorship attribution in instant messaging

Authorship attribution (AA) aims at recognizing automatically the author of a given text sample. Traditionally applied to literary texts, AA faces now the new challenge of recognizing the identity of people involved in chat conversations. These share many aspects with spoken conversations, but AA approaches did not take it into account so far. Hence, this paper tries to fill the gap and proposes two novelties that improve the effectiveness of traditional AA approaches for this type of data: the first is to adopt features inspired by Conversation Analysis (in particular for turn-taking), the second is to extract the features from individual turns rather than from entire conversations. The experiments have been performed over a corpus of dyadic chat conversations (77 individuals in total). The performance in identifying the persons involved in each exchange, measured in terms of area under the Cumulative Match Characteristic curve, is 89.5%.

[1]  Ludmila I. Kuncheva,et al.  A stability index for feature selection , 2007, Artificial Intelligence and Applications.

[2]  Hiroshi Motoda,et al.  Computational Methods of Feature Selection , 2022 .

[3]  Jay F. Nunamaker,et al.  Stylometric Identification in Electronic Markets: Scalability and Robustness , 2008, J. Manag. Inf. Syst..

[4]  Shlomo Argamon,et al.  Automatically profiling the author of an anonymous text , 2009, CACM.

[5]  Efstathios Stamatatos,et al.  A survey of modern authorship attribution methods , 2009, J. Assoc. Inf. Sci. Technol..

[6]  Benjamin C. M. Fung,et al.  A unified data mining solution for authorship analysis in anonymous textual communications , 2013, Inf. Sci..

[7]  Sharath Pankanti,et al.  Guide to Biometrics , 2003, Springer Professional Computing.

[8]  George M. Mohay,et al.  Mining e-mail content for author identification forensics , 2001, SGMD.

[9]  Dongsong Zhang,et al.  Can online behavior unveil deceivers? - an exploratory investigation of deception in instant messaging , 2004, 37th Annual Hawaii International Conference on System Sciences, 2004. Proceedings of the.

[10]  David G. Stork,et al.  Pattern Classification , 1973 .

[11]  Hsinchun Chen,et al.  A framework for authorship identification of online messages: Writing-style features and classification techniques , 2006 .

[12]  Angela Orebaugh,et al.  Classification of Instant Messaging Communications for Forensics Analysis , 2009 .

[13]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[14]  D. Holmes The Evolution of Stylometry in Humanities Scholarship , 1998 .

[15]  Maja Pantic,et al.  Social signal processing: Survey of an emerging domain , 2009, Image Vis. Comput..

[16]  Hsinchun Chen,et al.  Writeprints: A stylometric approach to identity-level identification and similarity detection in cyberspace , 2008, TOIS.