The Federalist Papers revisited: A collaborative attribution scheme

This paper presents and evaluates a collaborative attribution strategy based on six authorship attribution schemes representing the two main paradigms used in authorship studies. Based on very frequent words as features, the classical paradigm (or similarity-based methods) proposes to compute an intertextual distance between the disputed text and the different author profiles (concatenation of their writings). As a second paradigm, we can apply different machine learning schemes such as the naive Bayes, and the support vector machines (SVM). As an evaluation corpus, we have used The Federalist Papers, a well-known collection in authorship attribution. During our evaluation, we have tried to follow the recommendations and the best practices known to assess the various attribution schemes. The evaluation shows that, in the two paradigms, we can find effective attribution schemes. But when combining these individual results using a vote aggregation method, the final collaborative decision is always correct and robust. Moreover, to indicate the degree of belief attached to the combined attribution, we can consider the percentage of votes obtained by each possible assignment. When analyzing the output given by the individual attribution schemes, we also found that the provided information is difficult to interpret, at least, for the end-user.

[1]  Kurt Hornik,et al.  Support Vector Machines in R , 2006 .

[2]  Efstathios Stamatatos,et al.  A survey of modern authorship attribution methods , 2009, J. Assoc. Inf. Sci. Technol..

[3]  David J. Hand,et al.  Classifier Technology and the Illusion of Progress , 2006, math/0606441.

[4]  John Burrows,et al.  'Delta': a Measure of Stylistic Difference and a Guide to Likely Authorship , 2002, Lit. Linguistic Comput..

[5]  P. Maier Ratification: The People Debate the Constitution, 1787-1788 , 2010 .

[6]  Jacques Savoy,et al.  Authorship Attribution Based on Specific Vocabulary , 2012, TOIS.

[7]  Patrick Juola,et al.  The Time Course of Language Change , 2003, Comput. Humanit..

[8]  Dominique Labbé,et al.  Experiments on authorship attribution by intertextual distance in english* , 2007, J. Quant. Linguistics.

[9]  Glenn Fung,et al.  The disputed federalist papers: SVM feature selection via concave minimization , 2003, TAPIA '03.

[10]  R. Ketcham The Anti-Federalist Papers and the Constitutional Convention Debates , 2003 .

[11]  Blaise Cronin,et al.  Vernacular and vehicular language , 2009, J. Assoc. Inf. Sci. Technol..

[12]  Thorsten Joachims,et al.  Learning to classify text using support vector machines - methods, theory and algorithms , 2002, The Kluwer international series in engineering and computer science.

[13]  N. Given Entropy-Based Authorship Search in Large Document Collections , 2006 .

[14]  Justin Zobel,et al.  Entropy-Based Authorship Search in Large Document Collections , 2007, ECIR.

[15]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[16]  Jack Grieve,et al.  Quantitative Authorship Attribution: An Evaluation of Techniques , 2007, Lit. Linguistic Comput..

[17]  Michael J. Crawley,et al.  The R book , 2022 .

[18]  Joseph Rudman,et al.  The Twelve Disputed 'Federalist' Papers: A Case for Collaboration , 2012, DH.

[19]  Thorsten Joachims,et al.  A statistical learning learning model of text classification for support vector machines , 2001, SIGIR '01.

[20]  M. Meyerson Liberty's Blueprint: How Madison and Hamilton Wrote the Federalist, Defined the Constitution, and Made Democracy Safe for the World , 2008 .

[21]  J. M. Hughes,et al.  Quantitative patterns of stylistic influence in the evolution of literature , 2012, Proceedings of the National Academy of Sciences.

[22]  Patrick Juola,et al.  Authorship Attribution , 2008, Found. Trends Inf. Retr..

[23]  Frederick Mosteller,et al.  Applied Bayesian and classical inference : the case of the Federalist papers , 1984 .