Analyzing Stylometric Approaches to Author Obfuscation

Authorship attribution is an important and emerging security tool. However, just as criminals may wear gloves to hide their fingerprints, so too may criminal authors mask their writing styles to escape detection. Most authorship studies have focused on cooperative and/or unaware authors who do not take such precautions. This paper analyzes the methods implemented in the Java Graphical Authorship Attribution Program (JGAAP) against essays in the Brennan-Greenstadt obfuscation corpus that were written in deliberate attempts to mask style. The results demonstrate that many of the more robust and accurate methods implemented in JGAAP are effective in the presence of active deception.

[1]  H. T. Eddy The characteristic curves of composition. , 1887, Science.

[2]  Gregory R. Crane,et al.  What Do You Do with a Million Books? , 2006, D Lib Mag..

[3]  Peter Willett,et al.  The Porter stemming algorithm: then and now , 2006, Program.

[4]  Patrick Juola,et al.  Empirical evaluation of authorship obfuscation using JGAAP , 2010, AISec '10.

[5]  Patrick Juola,et al.  Authorship Attribution for Electronic Documents , 2006, IFIP Int. Conf. Digital Forensics.

[6]  Patrick Juola,et al.  A Controlled-corpus Experiment in Authorship Identification by Cross-entropy , 2003 .

[7]  Rebecca Treiman,et al.  The English Lexicon Project , 2007, Behavior research methods.

[8]  I.N. Bozkurt,et al.  Authorship attribution , 2007, 2007 22nd international symposium on computer and information sciences.

[9]  Sujeet Shenoi,et al.  Advances in Digital Forensics XII , 2007, IFIP Advances in Information and Communication Technology.

[10]  Carole E. Chaski,et al.  Empirical evaluations of language-based author identification techniques , 2001 .

[11]  Rachel Greenstadt,et al.  Practical Attacks Against Authorship Recognition Techniques , 2009, IAAI.

[12]  D. Holmes,et al.  The Federalist Revisited: New Directions in Authorship Attribution , 1995 .

[13]  David I. Holmes,et al.  Neural network applications in stylometry: The Federalist Papers , 1996, Comput. Humanit..

[14]  Colin Martindale,et al.  On the utility of content analysis in author attribution:The Federalist , 1995, Comput. Humanit..

[15]  Louis A. Penner,et al.  A value analysis of the disputed Federalist papers. , 1970 .

[16]  F. Mosteller,et al.  Inference and Disputed Authorship: The Federalist , 1966 .

[17]  Patrick Brennan,et al.  A Prototype for Authorship Attribution Studies , 2006, Lit. Linguistic Comput..