Vocabulary Richness Measure in Genres

Abstract This article deals with the one of the oldest and most traditional fields in quantitative linguistics, the concept of vocabulary richness. Although there are several methods for vocabulary richness measurement, all of them are influenced by text size. Therefore, the authors propose a new way of vocabulary richness measurement without any text length dependence. In the second part of the article, the new method is used for a genre analysis in texts written by the Czech writer Karel Čapek. Furthermore, differences between authors and between languages are studied with this method.

[1]  Ján Macutek,et al.  Evaluating goodness-of-fit of discrete distribution models in quantitative linguistics , 2013, J. Quant. Linguistics.

[2]  Gabriel Altmann,et al.  The Lambda-structure of Texts , 2012 .

[3]  Marie Těšitelová,et al.  Psaná a mluvená odborná čeština z kvantitativního hlediska (v rámci věcného stylu) , 1983 .

[4]  Ka Cormier,et al.  Annual Meeting of the Linguistic Society of America , 2004 .

[5]  Gabriel Altmann,et al.  Review Article: On Vocabulary Richness , 1999, J. Quant. Linguistics.

[6]  Frequency structure of New Year’s presidential speeches in Czech. The authorship analysis , .

[7]  Jozef philologist Mistrík Frekvencia tvarov a konštrukcií v slovenčine , 1985 .

[8]  Efstathios Stamatatos,et al.  Automatic Text Categorization In Terms Of Genre and Author , 2000, CL.

[9]  David L. Hoover,et al.  Another Perspective on Vocabulary Richness , 2003, Comput. Humanit..

[10]  George K. Mikros,et al.  Investigating Topic Influence in Authorship Attribution , 2007, PAN.

[11]  Efstathios Stamatatos,et al.  A survey of modern authorship attribution methods , 2009, J. Assoc. Inf. Sci. Technol..

[12]  Gabriel Altmann,et al.  Some aspects of word frequencies , 2006, Glottometrics.

[13]  Nicolas W. Hengartner,et al.  Quantitative Analysis of Literary Styles , 2002 .

[14]  R. Dabagh,et al.  Authorship Attribution and Statistical Text Analysis , 2007 .

[15]  Mara Frascarelli,et al.  Recensione di: "Studies on Scrambling - Movement and Non-Movement Approaches to Free Word Order Phenomena", Corver, N. and H. van Riemsdijk (eds.), Mouton de Gruyter: Berlin/New York, 1994. SILTA, , 1998 .

[16]  Ioan-Iovitz Popescu,et al.  Word Frequency Studies , 2009 .

[17]  Gabriel Altmann,et al.  Úvod do analýzy textov , 2003 .

[18]  Ján Horecký,et al.  Otázky lexikální statistiky , 1974 .

[19]  Michael A. Covington,et al.  Cutting the Gordian Knot: The Moving-Average Type–Token Ratio (MATTR) , 2010, J. Quant. Linguistics.

[20]  A. Miranda-García,et al.  The validity of lemma-based lexical richness in authorship attribution: A proposal for the Old English Gospels1 , 2005 .