Cutting the Gordian Knot: The Moving-Average Type–Token Ratio (MATTR)

Abstract Type–token ratio (TTR), or vocabulary size divided by text length (V/N), is a time-honoured but unsatisfactory measure of lexical diversity. The problem is that the TTR of a text sample is affected by its length. We present an algorithm for rapidly computing TTR through a moving window that is independent of text length, and we demonstrate that this measurement can detect changes within a text as well as differences between texts.

[1]  D. Ader,et al.  Formal Thought Disorder, the Type-Token Ratio, and Disturbed Voluntary Motor Movement in Schizophrenia , 1981, British Journal of Psychiatry.

[2]  C W Hess,et al.  The reliability of type-token ratios for the oral language of school age children. , 1989, Journal of speech and hearing research.

[3]  David Malvern,et al.  Investigating accommodation in language proficiency interviews using a new measure of lexical diversity , 2002 .

[4]  R. Harald Baayen,et al.  How Variable May a Constant be? Measures of Lexical Richness in Perspective , 1998, Comput. Humanit..

[5]  W. Johnson,et al.  Studies in language behavior , 1944 .

[6]  W. Johnson,et al.  Studies in language behavior: A program of research , 1944 .

[7]  Epaminondas Panas The Generalized Torquist: Specification and Estimation of a New Vocabulary-Text Size Function , 2001, J. Quant. Linguistics.

[8]  P. Guiraud Problèmes et méthodes de la statistique linguistique , 1960 .

[9]  D. Holmes The Analysis of Literary Style — a Review , 1985 .

[10]  B. Richards Type/Token Ratios: what do they really tell us? , 1987, Journal of Child Language.

[11]  Gustav Herdan,et al.  The advanced theory of language as choice and chance , 1968 .

[12]  L. Gleitman,et al.  [Language and thought]. , 1991, La Revue du praticien.

[13]  Robert S. Wachal,et al.  Some Measures of Lexical Diversity in Aphasic and Normal Language Performance , 1973, Language and speech.

[14]  G. Youmans A New Tool for Discourse Analysis: The Vocabulary-Management Profile. , 1991 .

[15]  C. W. Hess,et al.  Sample size and type-token ratios for oral language of preschool children. , 1986, Journal of speech and hearing research.

[16]  Dieter Müller,et al.  Computing the Type Token Relation From the A Priori Distribution of Types , 2002, J. Quant. Linguistics.

[17]  Dean G. Pruitt,et al.  Preintervention Effects of Mediation Versus Arbitration. , 1972 .

[18]  M. Kendall The Statistical Study of Literary Vocabulary , 1944, Nature.