The handling of missing binary data in language research

Researchers are frequently confronted with unanswered questions or items on their questionnaires and tests, due to factors such as item difficulty, lack of testing time, or participant distraction. This paper first presents results from a poll confirming previous claims (Rietveld & van Hout, 2006; Schafer & Graham, 2002) that data replacement and deletion methods are common in research. Language researchers declared that when faced with missing answers of the yes/no type (that translate into zero or one in data tables), the three most common solutions they adopt are to exclude the participant’s data from the analyses, to leave the square empty, or to fill in with zero, as for an incorrect answer. This study then examines the impact on Cronbach’s α of five types of data insertion, using simulated and actual data with various numbers of participants and missing percentages. Our analyses indicate that the three most common methods we identified among language researchers are the ones with the greatest impact on Cronbach's α coefficients; in other words, they are the least desirable solutions to the missing data problem. On the basis of our results, we make recommendations for language researchers concerning the best way to deal with missing data. Given that none of the most common simple methods works properly, we suggest that the missing data be replaced either by the item’s mean or by the participants’ overall mean to provide a better, more accurate image of the instrument’s internal consistency.

[1]  Toni Rietveld,et al.  Statistics in Language Research: Analysis of Variance , 2005 .

[2]  J. Schafer,et al.  Missing data: our view of the state of the art. , 2002, Psychological methods.

[3]  Andrew Harvey,et al.  Estimating Missing Observations in Economic Time Series , 1984 .

[4]  A. Rupp,et al.  Impact of Missing Data on the Detection of Differential Item Functioning , 2009 .

[5]  Klaas Sijtsma,et al.  On the Use, the Misuse, and the Very Limited Usefulness of Cronbach’s Alpha , 2008, Psychometrika.

[6]  Michael R. Harwell,et al.  Training Graduate Students in Educational Statistics: A National Survey. , 1996 .

[7]  R. Peterson A Meta-analysis of Cronbach's Coefficient Alpha , 1994 .

[8]  Tomas J. Philipson,et al.  Data Markets, Missing Data, and Incentive Pay , 2001 .

[9]  Daniel Fallon,et al.  The Buffalo Upon the Chimneypiece , 2006 .

[10]  Holmes Finch,et al.  Estimation of Item Response Theory Parameters in the Presence of Missing Data , 2008 .

[11]  Herbert W. Marsh,et al.  Pairwise Deletion for Missing Data in Structural Equation Models: Nonpositive Definite Matrices, Parameter Estimates, Goodness of Fit, and Adjusted Sample Sizes. , 1998 .

[12]  Cindy M. Walker,et al.  Impact of Missing Data on Person—Model Fit and Person Trait Estimation , 2008 .

[13]  Keming Yang Making Sense of Statistical Methods in Social Research , 2010 .

[14]  Charles F. Manski,et al.  Censoring of Outcomes and Regressors Due to Survey Nonresponse: Identification and estimation Using Weights and Imputations , 1998 .

[15]  P. Allison Estimation of Linear Models with Incomplete Data , 1987 .

[16]  Craig K. Enders,et al.  An introduction to modern missing data analyses. , 2010, Journal of school psychology.

[17]  Raquel Florez-Lopez,et al.  Effects of missing data in credit risk scoring. A comparative analysis of methods to achieve robustness in the absence of sufficient data , 2010 .

[18]  L. Cronbach Coefficient alpha and the internal structure of tests , 1951 .

[19]  Jörg-Peter Schräpler,et al.  Respondent Behavior in Panel Studies , 2004 .

[20]  Anne Lazaraton Current Trends in Research Methodology and Statistics in Applied Linguistics , 2000 .

[21]  G. King,et al.  Analyzing Incomplete Political Science Data: An Alternative Algorithm for Multiple Imputation , 2001, American Political Science Review.

[22]  Claudiu Daniel Tufis Multiple Imputation as a Solution to the Missing Data Problem in the Social Sciences , 2008 .

[23]  D. Rubin,et al.  Statistical Analysis with Missing Data , 1988 .

[24]  Craig K. Enders,et al.  Missing Data in Educational Research: A Review of Reporting Practices and Suggestions for Improvement , 2004 .

[25]  Linda de Serres,et al.  Psychometric validation of the Sentence Verification Technique to assess L2 reading comprehension ability , 2014 .

[26]  D. Altman,et al.  Missing data , 2007, BMJ : British Medical Journal.

[27]  Anne Lazaraton,et al.  Forming a Discipline: Applied Linguists' Literacy in Research Methodology and Statistics , 1987 .

[28]  Kazumi Matsuoka,et al.  Experimental Methods in Language Acquisition Research , 2012 .

[29]  B. J. Winer Statistical Principles in Experimental Design , 1992 .

[30]  Christopher Winship,et al.  Loglinear Models with Missing Data: A Latent Class Approach , 1989 .

[31]  C. Nicoletti,et al.  Estimating Income Poverty in the Presence of Missing Data and Measurement Error , 2009 .

[32]  Bryan S. Graham,et al.  Efficiency Bounds for Missing Data Models with Semiparametric Restrictions , 2008 .

[33]  Elana Shohamy,et al.  Test impact revisited: washback effect over time , 1996 .

[34]  B. J. Winer,et al.  Statistical Principles in Experimental Design, 2nd Edition. , 1973 .