On the Empirical Evaluation of Hybrid Author Identification Method

In this paper we focus on the identification of the author of a written text. We present a new hybrid method that combines a set of stylistic and statistical features in a machine learning process. We tested the effectiveness of the linguistic and statistical features combined with the inter-textual distance "Delta" on the PAN’@CLEF’2015 English corpus and we obtained 0.59 as c@1 precision.

[1]  John Burrows,et al.  'Delta': a Measure of Stylistic Difference and a Guide to Likely Authorship , 2002, Lit. Linguistic Comput..

[2]  Efstathios Stamatatos,et al.  Automatic Text Categorization In Terms Of Genre and Author , 2000, CL.

[3]  Danielle S. McNamara,et al.  Analyzing Writing Styles with Coh-Metrix , 2006, FLAIRS.

[4]  Efstathios Stamatatos,et al.  A survey of modern authorship attribution methods , 2009, J. Assoc. Inf. Sci. Technol..

[5]  Michael I. Jordan,et al.  Variational methods for the Dirichlet process , 2004, ICML.

[6]  Adriana Kovashka,et al.  Authorship Attribution Using Probabilistic Context-Free Grammars , 2010, ACL.

[7]  Rong Zheng,et al.  From fingerprint to writeprint , 2006, Commun. ACM.

[8]  Jacques Savoy,et al.  Attribution d'auteur par ensembles de séparateurs , 2013, CORIA.

[9]  Graeme Hirst,et al.  Authorship Verification with Entity Coherence and Other Rich Linguistic Features Notebook for PAN at CLEF 2013 , 2013, CLEF.

[10]  Austin F. Frank,et al.  Analyzing linguistic data: a practical introduction to statistics using R , 2010 .

[11]  Jack Grieve,et al.  Quantitative Authorship Attribution: An Evaluation of Techniques , 2007, Lit. Linguistic Comput..

[12]  Lee Gillam,et al.  A Trinity of Trials: Surrey's 2014 Attempts at Author Verification , 2014, CLEF.

[13]  Jacques Savoy,et al.  Etude comparative de stratégies de sélection de prédicteurs pour l'attribution d'auteur , 2012, CORIA.

[14]  J. F. Burrows,et al.  Not Unles You Ask Nicely: The Interpretative Nexus Between Analysis and Information , 1992 .

[15]  Hsinchun Chen,et al.  A framework for authorship identification of online messages: Writing-style features and classification techniques , 2006 .

[16]  Anselmo Peñas,et al.  A Simple Measure to Assess Non-response , 2011, ACL.

[17]  F. Mosteller,et al.  Inference in an Authorship Problem , 1963 .

[18]  Justin Zobel,et al.  Searching With Style: Authorship Attribution in Classic Literature , 2007, ACSC.

[19]  Cyril Labbé,et al.  Inter-Textual Distance and Authorship Attribution Corneille and Molière , 2001, J. Quant. Linguistics.

[20]  Shlomo Argamon,et al.  Stylistic text classification using functional lexical features , 2007, J. Assoc. Inf. Sci. Technol..