Effects of Text Length on Lexical Diversity Measures: Using Short Texts with Less than 200 Tokens.

Abstract Despite the importance of lexical diversity (LD) in L2 speaking and writing performance, LD assessment measures are known to be affected by the number of words analyzed in the text. This study aims to identify LD measures that are least affected by text length and can be used for the analysis of short L2 texts (50–200 tokens). We compared the type–token ratio, Guiraud index, Maas, measure of textual lexical diversity (MTLD), D, and HD–D to assess their robustness in relation to text length variation. Spoken texts of 200 tokens from 38 L2 English learners at the lower–intermediate-level were divided into segments of 50–200 tokens and the text length impact was examined. We found that MTLD was least affected by text length across most ranges, but was somewhat affected across 50–150 and 50–200 tokens. We further observed low correlations between equal-sized texts for up to 100 tokens. These results suggest that MTLD can be used with texts of more than 100 tokens and MTLD values can be compared between texts across 100 and 200 tokens. We also showed that D and HD–D produced similar results for texts; this indicates that D and HD–D are comparable.

[1]  K. Goulden,et al.  Effect Sizes for Research: A Broad Practical Approach , 2006 .

[2]  Jeanine Treffers-Daller,et al.  Exploring measures of vocabulary richness in semi-spontaneous French speech , 2007 .

[3]  C. W. Hess,et al.  Sample size and type-token ratios for oral language of preschool children. , 1986, Journal of speech and hearing research.

[4]  小泉 利恵,et al.  Relationships between productive vocabulary knowledge and speaking performance of Japanese learners of English at the novice level , 2006 .

[5]  Peter Skehan,et al.  Modelling Second Language Performance: Integrating Complexity, Accuracy, Fluency, and Lexis , 2009 .

[6]  C W Hess,et al.  The reliability of type-token ratios for the oral language of school age children. , 1989, Journal of speech and hearing research.

[7]  Jeanine Treffers-Daller,et al.  Language Dominance and Lexical Diversity: How Bilinguals and L2 Learners Differ in their Knowledge and Use of French Lexical and Functional Items , 2009 .

[8]  Philip M. McCarthy,et al.  MTLD, vocd-D, and HD-D: A validation study of sophisticated approaches to lexical diversity assessment , 2010, Behavior research methods.

[9]  David Malvern,et al.  Measuring vocabulary diversity using dedicated software , 2000 .

[10]  N. Schmitt Researching Vocabulary: A Vocabulary Research Manual , 2010 .

[11]  Thai Minh Dang Les caractères statistiques du vocabulaire : domaine vietnamien , 2000 .

[12]  G. Glass,et al.  Statistical methods in education and psychology , 1970 .

[13]  Scott Jarvis,et al.  vocd: A theoretical and empirical evaluation , 2007 .

[14]  Helmut Daller,et al.  Modelling and Assessing Vocabulary Knowledge: Lexical richness and the oral proficiency of Chinese EFL students , 2007 .

[15]  David Malvern,et al.  Lexical Diversity and Language Development: Quantification and Assessment , 2004 .

[16]  Scott Jarvis,et al.  Vocabulary knowledge : human ratings and automated measures , 2013 .

[17]  Jeanine Treffers-Daller Measuring lexical diversity among L2 learners of French: an exploration of the validity of D, MTLD and HD-D as measures of language ability , 2013 .

[18]  G. Glass,et al.  Statistical methods in education and psychology, 3rd ed. , 1996 .

[19]  Philip M. McCarthy GPAT: A Genre Purity Assessment Tool , 2010, FLAIRS.