Authorship Verification in the Absence of Explicit Features and Thresholds

Enhancing information retrieval systems with the ability to take the writing style of people into account opens the door for a number of applications. For example, one can link articles by authorships that can help identifying authors who generate hoaxes and deliberate misinformation in news stories, distributed across different platforms. Authorship verification (AV) is a technique that can be used for this purpose. AV deals with the task to judge, whether two or more documents stem from the same author. The majority of existing AV approaches relies on machine learning concepts based on explicitly defined stylistic features and complex models that involve a fair amount of parameters. Moreover, many existing AV methods are based on explicit thresholds (needed to accept or reject a stated authorship), which are determined on training corpora. We propose a novel parameter-free AV approach, which derives its thresholds for each verification case individually and enables AV in the absence of explicit features and training corpora. In an experimental setup based on eight evaluation corpora (each one from another language) we show that our approach yields competitive results against the current state of the art and other noteworthy AV baselines.

[1]  Benno Stein,et al.  Overview of the PAN/CLEF 2015 Evaluation Lab , 2015, CLEF.

[2]  Christian Winter,et al.  On the Usefulness of Compression Models for Authorship Verification , 2017, ARES.

[3]  Walter Daelemans,et al.  Authenticating the writings of Julius Caesar , 2016, Expert Syst. Appl..

[4]  Shlomo Argamon,et al.  Authorship attribution in the wild , 2010, Lang. Resour. Evaluation.

[5]  Benno Stein,et al.  Meta Analysis within Authorship Verification , 2008, 2008 19th International Workshop on Database and Expert Systems Applications.

[6]  Roman Kern,et al.  Extending Scientific Literature Search by Including the Author's Writing Style , 2017, BIR@ECIR.

[7]  Iván V. Meza,et al.  Homotopy Based Classification for Author Verification Task: Notebook for PAN at CLEF 2015 , 2015, CLEF.

[8]  Suhang Wang,et al.  Fake News Detection on Social Media: A Data Mining Perspective , 2017, SKDD.

[9]  Moshe Koppel,et al.  Determining if two documents are written by the same author , 2014, J. Assoc. Inf. Sci. Technol..

[10]  Youssef Iraqi,et al.  A Slightly-modified GI-based Author-verifier with Lots of Features (ASGALF) , 2014, CLEF.

[11]  Carla E. Brodley,et al.  Compression and machine learning: a new perspective on feature space vectors , 2006, Data Compression Conference (DCC'06).

[12]  Nektaria Potha,et al.  A Profile-Based Method for Authorship Verification , 2014, SETN.

[13]  Benno Stein,et al.  A Stylometric Inquiry into Hyperpartisan and Fake News , 2017, ACL.

[14]  Shachar Seidman,et al.  Authorship Verification Using the Impostors Method Notebook for PAN at CLEF 2013 , 2013, CLEF.

[15]  Jacques Savoy,et al.  A simple and efficient algorithm for authorship verification , 2017, J. Assoc. Inf. Sci. Technol..

[16]  Efstathios Stamatatos,et al.  Overview of the Author Identification Task at PAN 2013 , 2013, CLEF.

[17]  Malvina Nissim,et al.  GLAD: Groningen Lightweight Authorship Detection , 2015, CLEF.

[18]  A. Vinaya Babu,et al.  Authorship Attribution based on Data Compression for Telugu Text , 2015 .

[19]  John Noecker,et al.  Distractorless Authorship Verification , 2012, LREC.

[20]  Carl Vogel,et al.  Author Verification: Basic Stacked Generalization Applied To Predictions from a Set of Heterogeneous Learners - Notebook for PAN at CLEF 2015 , 2015, CLEF.

[21]  Douglas Bagnall,et al.  Author Identification Using Multi-headed Recurrent Neural Networks , 2015, CLEF.

[22]  Nektaria Potha,et al.  An Improved Impostors Method for Authorship Verification , 2017, CLEF.

[23]  Moshe Koppel,et al.  Authorship verification as a one-class classification problem , 2004, ICML.

[24]  Magdalena Jankowska,et al.  Proximity Based One-class Classification with Common N-Gram Dissimilarity for Authorship Verification Task Notebook for PAN at CLEF 2013 , 2013, CLEF.

[25]  Ofelia Cervantes,et al.  Author verification using a Graph-based Representation , 2015 .