Quantitative research methods and study quality in learner corpus research

Learner corpus research has seen major development since its inception some 25 years ago. Nevertheless, theoretical, methodological and empirical advances have been summarized in the literature only rarely and, in such cases, selectively rather than systematically. To the authors’ knowledge, in fact, there is no meta-analysis to date that summarizes and synthesizes the body of knowledge resulting from learner corpus research in a specific area of study (e.g. English as a Foreign Language learners’ use of collocations or tense, aspect and modality in learner writing). Equally concerning is that relatively little attention has been paid to the state or development of the field’s methodological practices, an unfortunate circumstance given the empirical rigor needed to reliably and accurately make use of corpus data and analyse frequencies of (co-)occurrence (Gries, 2013; Gries, forthcoming; Gries & Deshors, 2014). Progress in any discipline, however, crucially “depends on sound research methods, principled data analysis, and transparent reporting practices” (Plonsky & Gass, 2011: 325). This study thus aims to provide the first empirical assessment of quantitative research methods and study quality in learner corpus research. Study quality is defined rather broadly as "(a) adherence to standards of contextually appropriate methodological rigor in research practices and (b) transparent and complete reporting of such practices” (Plonsky, 2013: 657). Specifically, we systematically review all quantitative, primary studies referenced in the Learner Corpus Bibliography (LCB), a representative bibliography of learner corpus research maintained by the Learner Corpus Association (http://learnercorpusassociation.org) which currently contains approximately 1180 references. The techniques used to retrieve, code, and analyze this body of primary research are characteristic of research synthesis and meta-analysis. Following Plonsky (2013), however, this study differs from those traditions of synthetic research in that the focus here is almost exclusively methodological (i.e. the “how” of learner corpus research) rather than substantive (i.e. the “what”). Each reference in the LCB is surveyed using a coding scheme inspired from the protocol developed and first used by Plonsky & Gass (2011) to assess methodological quality in second language acquisition, and more particularly interaction research. The coding scheme is however revised and expanded to account for the methodological characteristics of corpus linguistics. Quantitative studies are coded for over 50 categories representing six dimensions: (a) publication type (i.e. conference paper, book chapter, journal article), (b) research focus (e.g. lexis, grammar), (c) methodological features (e.g. Contrastive Interlanguage Analysis, keyword analysis, error analysis, use of reference corpus), (d) statistical analyses (e.g. X², t-test, regression analysis), and (e) reporting practices (e.g. reliability coefficients, means). The 25-year span of research represented in the LCB provides a unique opportunity to examine the resulting data cumulatively and also permits analyses of changes taking place over time in the research and reporting practices of this domain. Preliminary results point to several systematic strengths as well as many flaws, such as the absence of research questions or hypotheses, incomplete and inconsistent reporting practices (e.g. means without standard deviations), and low statistical power (i.e. LCR studies generally overrely on tests of statistical significance such as the X² test, do not report effect sizes, rarely check or report whether statistical assumptions have been met, rarely use multivariate analyses). Improvements over time are however clearly noted and there are signs that, like other related disciplines, learner corpus research is slowly “undergoing a change to becoming much more empirical, much more rigorous, and much more quantitative/statistical” (Gries, 2013: 287) In addition to providing direction for future research and research practices, the study’s findings will also be discussed and contextualized within the research cultures of corpus linguistics, second language acquisition, and applied linguistics more generally. References Gries, S. (2013). Statistical tests for the analysis of learner corpus data. In Diaz-Negrillo A., Ballier N. & P. Thompson (eds). Automatic Treatment and Analysis of Learner Corpus Data. Amsterdam & Philadelphia: Benjamins. Gries, S. (forthcoming). Statistics for learner corpus research. In Granger S., G. Gilquin & F. Meunier (Eds). The Cambridge Handbook of Learner Corpus Research. Cambridge University Press. Gries, S., & Deshors, S. (2014). Using regressions to explore deviations between corpus data and a standard/target: two suggestion. Corpora, 9(1), 109–136. Plonsky, L. (2013). Study quality in SLA. An assessment of designs, analyses, and reporting practices in quantitative L2 research. Studies in Second Language Acquisition, 35, 655-687. Plonsky, L. & Gass, S. (2011). Quantitative research methods, study quality and outcomes: the case of interaction research. Language Learning 61(2): 325-366.

[1]  Jesse Egbert,et al.  Bootstrapping in Applied Linguistics: Assessing its Potential Using Shared Data , 2014 .

[2]  Sylviane Granger Automated Retrieval of Passives from Native and Learner Corpora , 1997 .

[3]  Stefan Th. Gries,et al.  Developments in English: Quantitative corpus approaches to linguistic analysis: seven or eight levels of resolution and the lessons they teach us , 2014 .

[4]  Sylviane Granger,et al.  Learner corpora: The missing link in EAP pedagogy , 2007 .

[5]  Sylviane Granger,et al.  Computer-Aided Error Analysis. , 1998 .

[6]  U. Römer The inseparability of lexis and grammar: Corpus linguistic perspectives , 2009 .

[7]  Douglas Biber,et al.  Quantitative designs and statistical techniques , 2015 .

[8]  Luke Plonsky Statistical Power, P Values, Descriptive Statistics, and Effect Sizes : A “Back-to-Basics” Approach to Advancing Quantitative Methods in L2 Research , 2015 .

[9]  L. Ortega SLA for the 21st Century: Disciplinary Progress, Transdisciplinary Relevance, and the Bi/multilingual Turn , 2013 .

[10]  Adam Kilgarriff,et al.  Language is never, ever, ever, random , 2005 .

[11]  Reinhard Köhler Statistical Comparability: Methodological Caveats , 2013, Building and Using Comparable Corpora.

[12]  S. Goodman,et al.  Meta-research: Evaluation and Improvement of Research Methods and Practices , 2015, PLoS biology.

[13]  Nick Pendar,et al.  Investigating the Promise of Learner Corpora: Methodological Issues , 2008 .

[14]  Sylviane Granger,et al.  Computer learner corpus research: current status and future prospects , 2004 .

[15]  Gries Stefan Th. Some Current Quantitative Problems in Corpus Linguistics and a Sketch of Some Solutions , 2015 .

[16]  Sylviane Granger,et al.  The Cambridge Handbook of Learner Corpus Research , 2015 .

[17]  Luke D Plonsky,et al.  Study quality in quantitative l2 research (1990-2010): A methodological synthesis and call for reform , 2014 .

[18]  Apa Publications,et al.  Reporting standards for research in psychology: why do we need them? What might they be? , 2008, The American psychologist.

[19]  Sylviane Granger,et al.  The International Corpus of Learner English. Version 2. Handbook and CD-Rom , 2009 .

[20]  S. Gries Chapter 8: Statistics for Learner Corpus Research , 2022 .

[21]  Sylviane Granger,et al.  From CA to CIA and back: An integrated approach to computerized bilingual and learner corpora , 1996 .

[22]  Emma Marsden,et al.  Breadth and depth: The IRIS repository , 2016 .

[23]  Alison Mackey,et al.  Research methods in second language acquisition : a practical guide , 2011 .

[24]  Heidi Byrnes,et al.  Notes from the editor: Notes from the Editor , 2014 .

[25]  Stefan Thomas Gries,et al.  Statistics for linguistics with R: A practical introduction (review) , 2012 .

[26]  Sylviane Granger,et al.  Contrastive interlanguage analysis: A reappraisal , 2015 .

[27]  A. Kilgarriff Comparing Corpora , 2001 .

[28]  Dan Brown,et al.  Methodological synthesis of research on the effectiveness of corrective feedback in L2 writing , 2015 .

[29]  B. Mackey,et al.  Bayesian Approaches to Imputation, Hypothesis Testing, and Parameter Estimation , 2015 .

[30]  Stefan Th. Gries,et al.  Exploring variability within and between corpora: some methodological considerations , 2006 .

[31]  S. Gass,et al.  AHISTORICITY REVISITED , 1998, Studies in Second Language Acquisition.

[32]  Graeme Keith Porte,et al.  Replication research in applied linguistics , 2012 .

[33]  Douglas Biber,et al.  Variation across speech and writing: Methodology , 1988 .

[34]  Sylviane Granger,et al.  Using collgrams to assess L2 phraseological development: A replication study , 2015 .

[35]  D. Biber,et al.  The Cambridge handbook of English corpus linguistics , 2015 .

[36]  Margaret Thomas,et al.  8. Research synthesis and historiography: The case of assessment of second language proficiency , 2006 .

[37]  J. Norris Statistical Significance Testing in Second Language Research: Basic Problems and Suggestions for Reform , 2015 .

[38]  Luke D Plonsky,et al.  Reporting and Interpreting Quantitative Research Findings: What Gets Reported and Recommendations for the Field. , 2015 .

[39]  Rodney H. Jones,et al.  TESOL Quarterly Research Guidelines , 2016 .

[40]  Stefan Th. Gries Statistical tests for the analysis of learner corpus data , 2013 .

[41]  Natalia Levshina,et al.  How to do Linguistics with R: Data exploration and statistical analysis , 2015 .

[42]  Luke D Plonsky,et al.  Task-Based Learner Production: A Substantive and Methodological Review , 2016, Annual Review of Applied Linguistics.

[43]  A. Gelman,et al.  Of Beauty , Sex and Power Too little attention has been paid to the statistical challenges in estimating small effects , 2022 .

[44]  Stefan Th. Gries,et al.  The Cambridge Handbook of Learner Corpus Research: Statistics for learner corpus research , 2015 .

[45]  Stefanie Wulff,et al.  The genitive alternation in Chinese and German ESL learners: Towards a multifactorial notion of context in learner corpus research , 2013 .

[46]  Lourdes Ortega,et al.  LONGITUDINAL RESEARCH IN SECOND LANGUAGE ACQUISITION: RECENT TRENDS AND FUTURE DIRECTIONS , 2005, Annual Review of Applied Linguistics.

[47]  Sylviane Granger The Louvain International Database of Spoken English Interlanguage (LINDSEI) Project , 1997 .

[48]  Luke Plonsky,et al.  STUDY QUALITY IN SLA , 2013, Studies in Second Language Acquisition.

[49]  Aline Godfroid,et al.  Reconceptualizing Reactivity of Think‐Alouds and Eye Tracking: Absence of Evidence Is Not Evidence of Absence , 2015 .

[50]  Sylviane Granger,et al.  The contribution of learner corpora to second language acquisition and foreign language teaching: A critical evaluation , 2009 .

[51]  R. Harald Baayen,et al.  Word Frequency Distributions , 2001 .

[52]  Walt Detmar Meurers,et al.  The MERLIN corpus: Learner language and the CEFR , 2014, LREC.

[53]  Luke Plonsky,et al.  Quantitative considerations for improving replicability in CALL and applied linguistics , 2015, CALICO Journal.

[54]  John Sinclair,et al.  Corpus, Concordance, Collocation , 1991 .

[55]  Florence Myles,et al.  Investigating learner language development with electronic longitudinal corpora: Theoretical and methodological issues , 2008 .

[56]  Luke Plonsky,et al.  MULTIPLE REGRESSION AS A FLEXIBLE ALTERNATIVE TO ANOVA IN L2 RESEARCH , 2016, Studies in Second Language Acquisition.

[57]  Lourdes Ortega,et al.  Understanding Second Language Acquisition , 2008 .

[58]  R. L. Present-Thomas,et al.  A comparative analysis of CEF level classification methods in a written learner corpus , 2013 .

[59]  Luke D Plonsky,et al.  A Meta-Analysis of Reliability Coefficients in Second Language Research , 2016 .

[60]  Stefan Evert,et al.  36. Statistical methods for corpus exploitation , 2009 .

[61]  Philip Durrant,et al.  Corpus frequency and second language learners' knowledge of collocations: A meta-analysis , 2014 .

[62]  周彬彬,et al.  Interlanguage : forty years later , 2014 .

[63]  Stefan Th. Gries Corpus Linguistics: Quantitative Methods , 2012 .

[64]  Luke Plonsky Advancing Quantitative Methods in Second Language Research , 2015 .

[65]  Cecilie Carlsen,et al.  Proficiency Level—a Fuzzy Variable in Computer Learner Corpora , 2012 .

[66]  A. Gelman,et al.  Of Beauty, Sex and Power , 2009 .

[67]  Magali Paquot,et al.  Distinctive words in academic writing: a comparison of three statistical tests for keyword extraction , 2009 .

[68]  Sylviane Granger,et al.  Formulaic Language in Learner Corpora , 2012, Annual Review of Applied Linguistics.

[69]  Vaclav Brezina,et al.  Significant or random?: A critical review of sociolinguistic generalisations based on large corpora , 2014 .

[70]  Dorit Ravid,et al.  Cross-linguistic evidence for the nature of age effects in second language acquisition , 2010, Applied Psycholinguistics.

[71]  Julia H. Littell,et al.  Study Quality Assessment in Systematic Reviews of Research on Intervention Effects , 2009 .

[72]  Alison Mackey,et al.  Advancing Methodology and Practice : The IRIS Repository of Instruments for Research into Second Languages , 2015 .

[73]  Developmental stages in second-language acquisition and levels of second-language proficiency: are there links between them? , 2010 .

[74]  Deirdre J. Derrick,et al.  Instrument Reporting Practices in Second Language Research. , 2016 .

[75]  E. R. Parson Notes from the editor , 2004, Journal of Contemporary Psychotherapy.

[76]  S. Gries,et al.  Some Proposals towards a More Rigorous Corpus Linguistics , 2006 .

[77]  Luke D Plonsky,et al.  An Assessment of Designs, Analyses, and Reporting Practices in Quantitative L2 Research , 2013 .

[78]  S. Gass,et al.  Quantitative Research Methods, Study Quality, and Outcomes: The Case of Interaction Research , 2011 .

[79]  J. Norris,et al.  Guidelines for Reporting Quantitative Methods and Results in Primary Research , 2015 .

[80]  Jenifer Larson-Hall,et al.  Improving Data Analysis in Second Language Acquisition by Utilizing Modern Developments in Applied Statistics , 2010 .

[81]  Frederick L. Oswald,et al.  Meta-analyzing second language research , 2015 .