FOI Cross-Domain Authorship Attribution for Criminal Investigations

Authorship attribution techniques have existed for a long time, but they are seldom evaluated in conditions similar to the real-world scenarios in which they have to work if they should be useful tools in criminal investigations involving digital communication. We have used a SVM classifier as a base, onto which we have added two sets of hand-crafted stylometric features and evaluated it using data from the PAN-CLEF 2019 cross-domain authorship attribution task. Results outperform the baseline systems to which our classifiers have been compared.

[1]  Fredrik Johansson,et al.  Multi-domain Alias Matching Using Machine Learning , 2016, 2016 Third European Network Intelligence Conference (ENIC).

[2]  Richard Dazeley,et al.  Authorship Attribution for Twitter in 140 Characters or Less , 2010, 2010 Second Cybercrime and Trustworthy Computing Workshop.

[3]  R. P. Fishburne,et al.  Derivation of New Readability Formulas (Automated Readability Index, Fog Count and Flesch Reading Ease Formula) for Navy Enlisted Personnel , 1975 .

[4]  Efstathios Stamatatos,et al.  A survey of modern authorship attribution methods , 2009, J. Assoc. Inf. Sci. Technol..

[5]  Dawn Xiaodong Song,et al.  On the Feasibility of Internet-Scale Author Identification , 2012, 2012 IEEE Symposium on Security and Privacy.

[6]  Benno Stein,et al.  TIRA Integrated Research Architecture , 2019, Information Retrieval Evaluation in a Changing World.

[7]  Ivandré Paraboni,et al.  EACH-USP Ensemble Cross-domain Authorship Attribution: Notebook for PAN at CLEF 2018 , 2018, CLEF.

[8]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[9]  Hsinchun Chen,et al.  Writeprints: A stylometric approach to identity-level identification and similarity detection in cyberspace , 2008, TOIS.

[10]  Benno Stein,et al.  Overview of the Cross-domain Authorship Attribution Task at PAN 2019 , 2019, CLEF.

[11]  Thamar Solorio,et al.  Authorship attribution of web forum posts , 2010, 2010 eCrime Researchers Summit.

[12]  George M. Mohay,et al.  Mining e-mail content for author identification forensics , 2001, SGMD.

[13]  I.N. Bozkurt,et al.  Authorship attribution , 2007, 2007 22nd international symposium on computer and information sciences.

[14]  M. D. Rijke,et al.  Information Retrieval Evaluation in a Changing World: Lessons Learned from 20 Years of CLEF , 2019, Information Retrieval Evaluation in a Changing World.

[15]  Fredrik Johansson,et al.  Detecting multiple aliases in social media , 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).