Another Perspective on Vocabulary Richness

This article examines the usefulness ofvocabulary richness for authorship attributionand tests the assumption that appropriatemeasures of vocabulary richness can capture anauthor's distinctive style or identity. Afterbriefly discussing perceived and actualvocabulary richness, I show that doubling andcombining texts affects some measures incomputationally predictable but conceptuallysurprising ways. I discuss some theoretical andempirical problems with some measures anddevelop simple methods to test how wellvocabulary richness distinguishes texts bydifferent authors. These methods show thatvocabulary richness is ineffective for largegroups of texts because of the extremevariability within and among them. I concludethat vocabulary richness is of marginal valuein stylistic and authorship studies because thebasic assumption that it constitutes awordprint for authors is false.

[1]  H. van Halteren,et al.  Outside the cave of shadows: using syntactic annotation to enhance authorship attribution , 1996 .

[2]  David I. Holmes,et al.  Neural network applications in stylometry: The Federalist Papers , 1996, Comput. Humanit..

[3]  Jon N. Hale,et al.  The Provenance of De Doctrina Christiana , 1997 .

[4]  Sascha Feuchert Doyle, [Sir] Arthur Conan , 2004 .

[5]  Benita Parry,et al.  The Nigger of the ‘Narcissus’ , 1983 .

[6]  R. Harald Baayen,et al.  Statistical models for word frequency distributions: A linguistic evaluation , 1992, Comput. Humanit..

[7]  Patricia Craig,et al.  The Voyage Out , 1915 .

[8]  D. Holmes,et al.  The Provenance of De Doctrina Christiana, attributed to John Milton: A Statistical Investigation , 1998 .

[9]  Burkhard Niederhoff,et al.  Stevenson, Robert Louis , 2020, Kindlers Literatur Lexikon (KLL).

[10]  G. Udny Yule,et al.  The statistical study of literary vocabulary , 1944 .

[11]  David L. Hoover Language and Style in The Inheritors , 1999 .

[12]  Arthur Conan Doyle,et al.  The Sign of Four , 1890 .

[13]  R. Harald Baayen,et al.  The Effects of Lexical Specialization on the Growth Curve of the Vocabulary , 1996, Comput. Linguistics.

[14]  William Faulkner Novels, 1930-1935 , 1985 .

[15]  이재만 The Tragedy of Pudd'nhead Wilson에서 정체성의 문제 , 1998 .

[16]  Harold Frederic,et al.  The Damnation of Theron Ware , 1896 .

[17]  安藤 聡,et al.  The Inheritors , 2000, The Inheritors and The Nature of a Crime.

[18]  John Bradley,et al.  Using Tact With Electronic Texts: A Guide to Text-Analysis Computing Tools : Version 2.1 for MS-DOS and PC DOS , 1996 .

[19]  Emily Brontë,et al.  Wuthering Heights (1847) , 2001 .

[20]  R. Harald Baayen,et al.  How Variable May a Constant be? Measures of Lexical Richness in Perspective , 1998, Comput. Humanit..

[21]  M. Kendall The Statistical Study of Literary Vocabulary , 1944, Nature.

[22]  村上 浩,et al.  SONS AND LOVERS 考 , 1983 .

[23]  Charles Waddell Chesnutt,et al.  The House Behind the Cedars , 1900 .

[24]  Simon Burns,et al.  The age of innocence. , 2005, Nursing standard (Royal College of Nursing (Great Britain) : 1987).

[25]  Willa Cather,et al.  The Professor's House , 1925 .

[26]  福岡 忠雄,et al.  The Mayor of Casterbridge試論 , 1983 .

[27]  Hugh Craig Authorial attribution and computational stylistics: if you can tell authors apart, have you learned anything about them? , 1999 .

[28]  John F. Burrows,et al.  Lyrical drama and the “turbid mountebanks”: Styles of dialogue in romantic and renaissance tragedy , 1994, Comput. Humanit..

[29]  Hugh Craig Contrast and Change in the Idiolects of Ben Jonson Characters , 1999, Comput. Humanit..

[30]  L. Frank Baum,et al.  The Marvelous Land of Oz , 1904 .

[31]  Peter Usborne,et al.  In the Jungle , 1972 .

[32]  R. D. Blandford,et al.  To the lighthouse , 2022, Physics World.

[33]  Stephen Crane The Bride Comes to Yellow Sky , 1898 .

[34]  A. Doyle The Return of Sherlock Holmes , 1905 .

[35]  Noorul Hasan,et al.  Jude the Obscure (1895) , 1982 .

[36]  Lou Burnard,et al.  Oxford Text Archive , 1991 .

[37]  Philippe Thoiron Diversity index and entropy as measures of lexical richness , 1986, Comput. Humanit..

[38]  D. Holmes,et al.  The Federalist Revisited: New Directions in Authorship Attribution , 1995 .