On the utility of content analysis in author attribution:The Federalist

In studies of author attribution, measurement of differential use of function words is the most common procedure, though lexical statistics are often used. Content analysis has seldom been employed. We compare the success of lexical statistics, content analysis, and function words in classifying the 12 disputedFederalist papers. Of course, Mosteller and Wallace (1964) have presented overwhelming evidence that all 12 were by James Madison rather than by Alexander Hamilton. Our purpose is not to challenge these attributions but rather to useThe Federalist as a test case. We found lexical statistics to be of no use in classifying the disputed papers. Using both classical canonical discriminant analysis and a neural-network approach, content analytic measures — the Harvard III Psychosociological Dictionary and semantic differential indices — were found to be successful at attributing most of the disputed papers to Madison. However, a function-word approach is more successful. We argue that content analysis can be useful in cases where the function-word approach does not yield compelling conclusions and, perhaps, in preliminary screening in cases where there are a large number of possible authors.

[1]  Richard Frautschi Lexical and focal preferences in Rousseau'sProfession de foi du Vicaire Savoyard (Book IV ofEmile) , 1989, Comput. Humanit..

[2]  J. Kruskal Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis , 1964 .

[3]  Thomas B. Horton,et al.  The effectiveness of the stylometry of function words in discriminating between Shakespeare and Fletcher , 1987 .

[4]  T. V. N. Merriam Marlowe’s Hand in Edward III , 1993 .

[5]  S. Fienberg,et al.  The Clockwork Muse: The Predictability of Artistic Change. , 1991 .

[6]  Ward E. Y. Elliott,et al.  A touchstone for the bard , 1991, Comput. Humanit..

[7]  H. Scarborough,et al.  Lexical correlates of cervical cancer. , 1978, Social science & medicine.

[8]  Charles W. Butler,et al.  Naturally intelligent systems , 1990 .

[9]  C. Martindale,et al.  Cognitive Psychology: A Neural-Network Approach , 1990 .

[10]  Ward E. Y. Elliott,et al.  Who Was Shakespeare , 1991 .

[11]  S. Freud The Psychopathology of Everyday Life , 1915 .

[12]  C. W. Anderson,et al.  Quantification of rewriting by the Brothers Grimm: A comparison of successive versions of three tales , 1989, Comput. Humanit..

[13]  Colin Martindale LEXSTAT: A PL/I program for computation of lexical statistics , 1974 .

[14]  R. Forsyth Neural learning algorithms: some empirical trials , 1990 .

[15]  J. Springer A Mechanical Solution of a Literary Problem , 1923 .

[16]  C. B. Williams A NOTE ON THE STATISTICAL ANALYSIS OF SENTENCE-LENGTH AS A CRITERION OF LITERARY STYLE BY , 2008 .

[17]  Lee Sigelman,et al.  The not-so-simple art of imitation: Pastiche, literary style, and Raymond Chandler , 1996, Comput. Humanit..

[18]  E. J. Anthony,et al.  EFFECTS OF PERINATAL ANOXIA AFTER SEVEN YEARS. , 1965, Psychological monographs.

[19]  J. M. Kittross The measurement of meaning , 1959 .

[20]  G. Udny Yule,et al.  The statistical study of literary vocabulary , 1944 .

[21]  Frederick Mosteller,et al.  Applied Bayesian and classical inference : the case of the Federalist papers , 1984 .

[22]  S. Fienberg,et al.  Inference and Disputed Authorship: The Federalist , 1966 .

[23]  Robert Matthews,et al.  Neural Computation in Stylometry I: An Application to the Works of Shakespeare and Fletcher , 1993 .

[24]  Louis A. Penner,et al.  A value analysis of the disputed Federalist papers. , 1970 .

[25]  Fred J. Damerau,et al.  The use of function word frequencies as indicators of style , 1975 .

[26]  D. R. Heise,et al.  Semantic di erential profiles for 1000 most frequent English words , 1965 .

[27]  D. F. Specht,et al.  Generalization accuracy of probabilistic neural networks compared with backpropagation networks , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[28]  Muriel Vasconcellos Long-term Data for an MT Policy , 1989 .

[29]  Richard Forsyth,et al.  Classification by similarity: An overview of statistical methods of case-based reasoning , 1995 .

[30]  S. Siegel,et al.  Nonparametric Statistics for the Behavioral Sciences , 2022, The SAGE Encyclopedia of Research Design.

[31]  David J. Hand,et al.  Discrimination and Classification , 1982 .

[32]  J. L. Hodges,et al.  Discriminatory Analysis - Nonparametric Discrimination: Consistency Properties , 1989 .

[33]  G. Yule ON SENTENCE- LENGTH AS A STATISTICAL CHARACTERISTIC OF STYLE IN PROSE: WITH APPLICATION TO TWO CASES OF DISPUTED AUTHORSHIP , 1939 .

[34]  Donald F. Specht,et al.  Probabilistic neural networks , 1990, Neural Networks.

[35]  Marshall S. Smith,et al.  The general inquirer: A computer approach to content analysis. , 1967 .