Quantitative Analysis of Literary Styles

Writers are often viewed as having an inherent style that can serve as a literary fingerprint. By quantifying relevant features related to literary style, one may hope to classify written works and even attribute authorship to newly discovered texts. Beyond its intrinsic interest, the study of literary styles presents the opportunity to introduce and motivate many standard multivariate statistical techniques. Today the statistical analysis of literary styles is made much simpler by the wealth of real data readily available from the Internet. This article presents an overview and brief history of the analysis of literary styles. In addition we use canonical discriminant analyis and principal component analysis to identify structure in the data and distinguish authorship.

[1]  D. Holmes A Stylometric Analysis of Mormon Scripture and Related Texts , 1992 .

[2]  Richard A. Johnson,et al.  Applied Multivariate Statistical Analysis , 1983 .

[3]  B. Efron,et al.  Did Shakespeare write a newly-discovered poem? , 1987 .

[4]  Marcos Dipinto,et al.  Discriminant analysis , 2020, Predictive Analytics.

[5]  C. B. Williams A NOTE ON THE STATISTICAL ANALYSIS OF SENTENCE-LENGTH AS A CRITERION OF LITERARY STYLE BY , 2008 .

[6]  F. Mosteller,et al.  Inference in an Authorship Problem , 1963 .

[7]  M. Hill,et al.  Nonlinear Multivariate Analysis. , 1990 .

[8]  C. B. Williams STUDIES IN THE HISTORY OF PROBABILITY AND STATISTICS: IV. A NOTE ON AN EARLY STATISTICAL STUDY OF LITERARY STYLE , 1956 .

[9]  S. Fienberg,et al.  Inference and Disputed Authorship: The Federalist , 1966 .

[10]  D. Holmes The Analysis of Literary Style — a Review , 1985 .

[11]  M. R. Mickey,et al.  Estimation of Error Rates in Discriminant Analysis , 1968 .

[12]  I. Jolliffe Principal Component Analysis , 2002 .

[13]  Ross Ihaka,et al.  Gentleman R: R: A language for data analysis and graphics , 1996 .

[14]  A. Q. Morton The Authorship of Greek Prose , 1965 .

[15]  Frederick Mosteller,et al.  Applied Bayesian and classical inference : the case of the Federalist papers , 1984 .

[16]  Claude S. Brinegar,et al.  Mark Twain and the Quintus Curtius Snodgrass Letters: A Statistical Test of Authorship , 1963 .

[17]  C. B. Williams Mendenhall's studies of word-length distribution in the works of Shakespeare and Bacon , 1975 .

[18]  Carl-Erik Särndal,et al.  On Deciding Cases of Disputed Authorship , 1967 .

[19]  I. Jolliffe,et al.  Nonlinear Multivariate Analysis , 1992 .