论文信息 - Overview of the Author Obfuscation Task at PAN 2017: Safety Evaluation Revisited

Overview of the Author Obfuscation Task at PAN 2017: Safety Evaluation Revisited

We report on the second large-scale evaluation of style obfuscation approaches in a shared task on author obfuscation, organized at the PAN 2017 lab on digital text forensics. Author obfuscation means to automatically paraphrase a given text such that state-of-the-art authorship verification approaches misjudge a given pair of documents as having been written by “different authors” if in fact they would have decided otherwise without obfuscation. This year, two new obfuscators are compared to the participants from last year’s task against a total of 44 authorship verification approaches. The best-performing obfuscator successfully impacts the decision-making process of the authorship verifiers significantly. However, as in the last year, the paraphrased texts are often not really human-readable anymore and have some changed context, indicating that there is still way to go to “perfect” automatic obfuscation that (1) tricks verification approaches, (2) keeps the meaning of the original, and (3) is, regarding its obfuscation, unsuspicious to a human eye.

[1] Chris Callison-Burch,et al. PPDB: The Paraphrase Database , 2013, NAACL.

[2] Efstathios Stamatatos,et al. Overview of the Author Identification Task at PAN 2013 , 2013, CLEF.

[3] Oleg Bakhteev,et al. Author Masking using Sequence-to-Sequence Models , 2017, CLEF.

[4] Matthias Hagen,et al. Generating Acrostics via Paraphrasing and Heuristic Search , 2014, COLING.

[5] Matthias Hagen,et al. Author Obfuscation: Attacking the State of the Art in Authorship Verification , 2016, CLEF.

[6] Taher Rahgooy,et al. Author Obfuscation using WordNet and Language Models , 2016, CLEF.

[7] Jimmy J. Lin,et al. Evaluation-as-a-Service: Overview and Outlook , 2015, ArXiv.

[8] Benno Stein,et al. Improving the Reproducibility of PAN's Shared Tasks: - Plagiarism Detection, Author Identification, and Author Profiling , 2014, CLEF.

[9] Benno Stein,et al. Ousting ivory tower research: towards a web framework for providing experiments as a service , 2012, SIGIR '12.

[10] Prasenjit Majumder,et al. Author Masking through Translation , 2016, CLEF.

[11] Benno Stein,et al. Overview of the Author Identification Task at PAN-2017: Style Breach Detection and Author Clustering , 2017, CLEF.

[12] Taher Rahgooy,et al. obfuscation using WordNet and language models Notebook for PAN at CLEF 2016 , 2016 .