Meta Analysis within Authorship Verification

In an authorship verification problem one is given writing examples from an author A, and one is asked to determine whether or not each text in fact was written by A. In a more general form of the authorship verification problem one is given a single document d only, and the question is whether or not d contains sections from other authors. The heart of authorship verification is the quantization of an author's writing style along with an outlier analysis to identify anomalies. Human readers are well-versed in detecting such spurious sections since they combine a highly-developed sense for wording with context-dependent meta knowledge in their analysis. The intention of this paper is to compile an overview of the algorithmic building blocks for authorship verification. In particular, we introduce authorship verification problems as decision problems, discuss possibilities for the use of meta knowledge, and apply meta analysis to post- process unreliable style analysis results. Our meta analysis combines a confidence-based majority decision with the unmasking approach of Koppel and Schler. With this strategy we can improve the analysis quality in our experiments by 33% in terms of the F-measure.

[1]  Efstathios Stamatatos Author Identification Using Imbalanced and Limited Training Texts , 2007 .

[2]  Benno Stein,et al.  Genre classification of Web pages user study and feasibility analysis , 2004 .

[3]  Sven Meyer Genre Classification of Web Pages User Study and Feasibility Analysis , 2004 .

[4]  Efstathios Stamatatos,et al.  Computer-Based Authorship Attribution Without Lexical Measures , 2001, Comput. Humanit..

[5]  Mark Stefik,et al.  Introduction to knowledge systems , 1995 .

[6]  Mitchell P. Marcus,et al.  Topic segmentation: algorithms and applications , 1998 .

[7]  Robert P. W. Duin,et al.  Combining One-Class Classifiers , 2001, Multiple Classifier Systems.

[8]  R. P. Fishburne,et al.  Derivation of New Readability Formulas (Automated Readability Index, Fog Count and Flesch Reading Ease Formula) for Navy Enlisted Personnel , 1975 .

[9]  Moshe Koppel,et al.  Exploiting Stylistic Idiosyncrasies for Authorship Attribution , 2003 .

[10]  J. Chall,et al.  A FORMULA FOR PREDICTING READABILITY , 1948 .

[11]  Benno Stein,et al.  Genre Classification of Web Pages , 2004, KI.

[12]  R. Flesch A new readability yardstick. , 1948, The Journal of applied psychology.

[13]  Graeme Hirst,et al.  Segmenting documents by stylistic character , 2005, Natural Language Engineering.

[14]  David M. J. Tax,et al.  One-class classification , 2001 .

[15]  R. Gunning The Technique of Clear Writing. , 1968 .

[16]  Freddy Y. Y. Choi Advances in domain independent linear text segmentation , 2000, ANLP.

[17]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[18]  Khedija Arour,et al.  A Binary Decision Diagram to discover low threshold support frequent itemsets , 2007 .

[19]  Malik Yousef,et al.  One-Class SVMs for Document Classification , 2002, J. Mach. Learn. Res..

[20]  M. Kendall The Statistical Study of Literary Vocabulary , 1944, Nature.

[21]  SteinBenno,et al.  Plagiarism analysis, authorship identification, and near-duplicate detection PAN'07 , 2007 .

[22]  Benno Stein,et al.  Intrinsic Plagiarism Detection , 2006, ECIR.

[23]  J. Chall,et al.  Readability revisited : the new Dale-Chall readability formula , 1995 .

[24]  Gunnar Rätsch,et al.  Constructing Boosting Algorithms from SVMs: An Application to One-Class Classification , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  Graeme Hirst,et al.  Segmenting a document by stylistic character , 2003 .

[26]  Shlomo Argamon,et al.  Authorship attribution with thousands of candidate authors , 2006, SIGIR.

[27]  Moshe Koppel,et al.  Authorship verification as a one-class classification problem , 2004, ICML.

[28]  Benno Stein,et al.  Intrinsic Plagiarism Analysis with Meta Learning , 2007, PAN.