Overview of the Author Obfuscation Task at PAN 2018: A New Approach to Measuring Safety

In this paper, we evaluate seven author obfuscation approaches which are supposed to automatically mask an author’s writing style in a given text to render automatic author identification impossible. The approaches are evaluated with regard to their safety, soundness, and sensibleness in terms of beating 44 author identification approaches, retaining the original meaning of the obfuscated text, and producing inconspicuous, human-readable obfuscations, respectively. Regarding the measurement of safety in particular, we introduce a set of new performance measures which are designed to render the performance of obfuscation approaches comparable as the numbers of author identification approaches and evaluation datasets increases, incorporating their respective performance and quality. Based on the new measures, we establish a world ranking of obfuscators.

[1]  Benno Stein,et al.  Improving the Reproducibility of PAN's Shared Tasks: - Plagiarism Detection, Author Identification, and Author Profiling , 2014, CLEF.

[2]  Daniel Castro-Castro,et al.  Author Masking by Sentence Transformation , 2017, CLEF.

[3]  Magdalena Jankowska,et al.  Ensembles of Proximity-Based One-Class Classifiers for Author Verification Notebook for PAN at CLEF 2014 , 2014, CLEF.

[4]  Efstathios Stamatatos,et al.  Overview of the Author Identification Task at PAN 2013 , 2013, CLEF.

[5]  Oleg Bakhteev,et al.  Author Masking using Sequence-to-Sequence Models , 2017, CLEF.

[6]  Jacques Savoy,et al.  UniNE at CLEF 2018: Author Masking: Notebook for PAN at CLEF 2018 , 2018, CLEF.

[7]  Benno Stein,et al.  Ousting ivory tower research: towards a web framework for providing experiments as a service , 2012, SIGIR '12.

[8]  Magdalena Jankowska,et al.  Proximity Based One-class Classification with Common N-Gram Dissimilarity for Authorship Verification Task Notebook for PAN at CLEF 2013 , 2013, CLEF.

[9]  Matthias Hagen,et al.  Overview of the Author Obfuscation Task at PAN 2017: Safety Evaluation Revisited , 2017, CLEF.

[10]  Mostafa Rahgouy,et al.  Author Masking Directed by Author's Style: Notebook for PAN at CLEF 2018 , 2018, CLEF.

[11]  Jimmy J. Lin,et al.  Evaluation-as-a-Service: Overview and Outlook , 2015, ArXiv.

[12]  Chris Callison-Burch,et al.  PPDB: The Paraphrase Database , 2013, NAACL.

[13]  Benno Stein,et al.  Overview of the Author Identification Task at PAN-2017: Style Breach Detection and Author Clustering , 2017, CLEF.

[14]  Taher Rahgooy,et al.  obfuscation using WordNet and language models Notebook for PAN at CLEF 2016 , 2016 .

[15]  Preslav Nakov,et al.  SU@PAN'2016: Author Obfuscation , 2016, CLEF.

[16]  Matthias Hagen,et al.  Author Obfuscation: Attacking the State of the Art in Authorship Verification , 2016, CLEF.