Versification and authorship attribution. A pilot study on Czech, German, Spanish, and English poetry

This article describes pilot experiments performed as one part of a longterm project examining the possibilities for using versification analysis to determine the authorships of poetic texts. Since we are addressing this article to both stylometry experts and experts in the study of verse, we first introduce in detail the common classifiers used in contemporary stylometry (Burrows’ Delta, Argamon’s Quadratic Delta, Smith-Aldridge’s Cosine Delta, and the Support Vector Machine) and explain how they work via graphic examples. We then provide an evaluation of these classifiers’ performance when used with the versification features found in Czech, German, Spanish, and English poetry. We conclude that versification is a reasonable stylometric marker, the strength of which is comparable to the other markers traditionally used in stylometry (such as the frequencies of the most frequent words and the frequencies of the most frequent character n-grams).

[1]  T C Mendenhall,et al.  THE CHARACTERISTIC CURVES OF COMPOSITION. , 1887, Science.

[2]  J. Springer A Mechanical Solution of a Literary Problem , 1923 .

[3]  The Authorship of 'The Taming of the Shrew' , 1927 .

[4]  G. Yule ON SENTENCE- LENGTH AS A STATISTICAL CHARACTERISTIC OF STYLE IN PROSE: WITH APPLICATION TO TWO CASES OF DISPUTED AUTHORSHIP , 1939 .

[5]  E. H. Simpson Measurement of Diversity , 1949, Nature.

[6]  F. Mosteller,et al.  Inference and Disputed Authorship: The Federalist , 1966 .

[7]  C. B. Williams Mendenhall's studies of word-length distribution in the works of Shakespeare and Bacon , 1975 .

[8]  S. C. Sen Gupta A Shakespeare manual , 1982 .

[9]  Marina Tarlinskai︠a︡,et al.  Shakespeare's Verse: Iambic Pentameter and the Poet's Idiosyncrasies , 1987 .

[10]  John Burrows,et al.  'Delta': a Measure of Stylistic Difference and a Guide to Likely Authorship , 2002, Lit. Linguistic Comput..

[11]  John Burrows,et al.  Questions of Authorship: Attribution and Beyond A Lecture Delivered on the Occasion of the Roberto Busa Award ACH-ALLC 2001, New York , 2003, Comput. Humanit..

[12]  David L. Hoover,et al.  Testing Burrows's Delta , 2004, Lit. Linguistic Comput..

[13]  J. Grieve Quantitative authorship attribution:a history and evaluation of techniques , 2005 .

[14]  I.N. Bozkurt,et al.  Authorship attribution , 2007, 2007 22nd international symposium on computer and information sciences.

[15]  Jack Grieve,et al.  Quantitative Authorship Attribution: An Evaluation of Techniques , 2007, Lit. Linguistic Comput..

[16]  Shlomo Argamon,et al.  Interpreting Burrows's Delta: Geometric and Probabilistic Foundations , 2007, Lit. Linguistic Comput..

[17]  Shlomo Argamon,et al.  Computational methods in authorship attribution , 2009, J. Assoc. Inf. Sci. Technol..

[18]  Efstathios Stamatatos,et al.  A survey of modern authorship attribution methods , 2009, J. Assoc. Inf. Sci. Technol..

[19]  Maciej Eder,et al.  Does size matter? Authorship attribution, small samples, big problem , 2015, Digit. Scholarsh. Humanit..

[20]  Kevin Knight,et al.  Unsupervised Discovery of Rhyme Schemes , 2011, ACL.

[21]  Maciej Eder,et al.  Style-markers in authorship attribution : a cross-language study of the authorial fingerprint , 2011 .

[22]  Peter W. H. Smith,et al.  Improving Authorship Attribution: Optimizing Burrows' Delta Method* , 2011, J. Quant. Linguistics.

[23]  George K. Mikros,et al.  Authorship Attribution in Greek Tweets Using Author's Multilevel N-Gram Profiles , 2013, AAAI Spring Symposium: Analyzing Microtext.

[24]  P. Grzybek The emerGence of STylomeTry : ProleGomena To The hiSTory of Term and concePT , 2013 .

[25]  Shakespeare and the Versification of English Drama, 1561–1642 , 2015 .

[26]  Métrique littéraire, métrique linguistique et métrique algorithmique de l'allemand mises en jeu dans le programme Metricalizer2 , 2015 .

[27]  Borja Navarro-Colorado,et al.  A computational linguistic approach to Spanish Golden Age Sonnets: metrical and semantic aspects , 2015, CLfL@NAACL-HLT.

[28]  Petr Plecháč,et al.  The Corpus of Czech Verse , 2015 .

[29]  M. Lotman A study on Shakespeare’s verse in its historical context (Marina Tarlinskaja, Shakespeare and the Versification of English Drama, 1561–1642, Ashgate, 2014) , 2015 .

[30]  Shakespeare and the Versification of English Drama, 1561 – 1642 , 2016 .

[31]  P. Plecháč Czech Verse Processing System KVĚTA – Phonetic and Metrical Components , 2016 .

[32]  Borja Navarro-Colorado,et al.  Metrical Annotation of a Large Corpus of Spanish Sonnets: Representation, Scansion and Evaluation , 2016, LREC.

[33]  Al-Falahi Ahmed,et al.  Machine Learning for Authorship Attribution in Arabic Poetry , 2017 .

[34]  Maciej Eder,et al.  Short Samples in Authorship Attribution: A New Approach , 2017, DH.

[35]  Petr Plecháč,et al.  A Collocation-Driven Method of Discovering Rhymes (in Czech, English, and French Poetry) , 2018 .