Evaluating test validity: reprise and progress

The AERA, APA, NCME Standards define validity as ‘the degree to which evidence and theory support the interpretations of test scores for proposed uses of tests’. A century of disagreement about validity does not mean that there has not been substantial progress. This consensus definition brings together interpretations and use so that it is one idea, not a sequence of steps. Just as test design is framed by a particular context of use, so too must validation research focus on the adequacy of tests for specific purposes. The consensus definition also carries forward major reforms in validity theory begun in the 1970s that rejected separate types of validity evidence for different types of tests, e.g. content validity for achievement tests and predictive correlations for employment tests. When the current definition refers to both ‘evidence and theory’ the Standards are requiring not just that a test be well designed based on theory but that evidence be collected to verify that the test device is working as intended. Having taught policy-makers, citizens, and the courts to use the word validity, especially in high-stakes applications, we cannot after the fact substitute a more limited, technical definition of validity. An official definition provides clarity even for those who disagree, because it serves as a touchstone and obliges them to acknowledge when they are departing from it.

[1]  Pamela A. Moss,et al.  Shifting the focus of validity for test use , 2016 .

[2]  G. Cizek Validating test score meaning and defending test score use: different aims, different methods , 2016 .

[3]  Stuart D. Shaw,et al.  Disagreement over the best way to use the word ‘validity’ and options for reaching consensus , 2016 .

[4]  K. Markus Alternative vocabularies in the test validity literature , 2016 .

[5]  Stephen G. Sireci,et al.  On the validity of useless tests , 2016 .

[6]  M. Kane Explicating validity , 2016 .

[7]  D. Borsboom Educational Measurement (4th ed.) , 2009 .

[8]  K. Howe Positivist Dogmas, Rhetoric, and the Education Science Question , 2009 .

[9]  J. Herman Accountability and Assessment: Is Public Interest in K-12 Education Being Served? CRESST Report 728. , 2007 .

[10]  Lorrie A. Shepard,et al.  The Centrality of Test Use and Consequences for Test Validity. , 2005 .

[11]  Lenore Adie,et al.  Assessment in education: principles, policy and practice , 2004 .

[12]  M. Kane Current Concerns in Validity Theory , 2001 .

[13]  R. Glaser,et al.  Knowing What Students Know: The Science and Design of Educational Assessment , 2001 .

[14]  Michael T. Kane,et al.  An argument-based approach to validity. , 1992 .

[15]  Thomas M. Haladyna,et al.  Raising Standardized Achievement Test Scores and the Origins of Test Score Pollution , 1991 .

[16]  L. Cronbach,et al.  Psychological tests and personnel decisions , 1958 .

[17]  D. Eignor The standards for educational and psychological testing. , 2013 .

[18]  Alija Kulenović,et al.  Standards for Educational and Psychological Testing , 1999 .

[19]  D. Borsboom,et al.  The concept of validity. , 2004, Psychological review.

[20]  H. Putnam The Collapse of the Fact/Value Dichotomy and Other Essays , 2002 .

[21]  Identifiers California,et al.  Annual Meeting of the National Council on Measurement in Education , 1998 .

[22]  L. Shepard Chapter 9: Evaluating Test Validity , 1993 .

[23]  S. Zedeck Fairness in Employment Testing: Validity Generalization, Minority Issues, and the General Aptitude Test Battery. , 1990 .

[24]  Lee J. Cronbach,et al.  Construct validation after thirty years. , 1989 .

[25]  R. Linn Educational measurement, 3rd ed. , 1989 .

[26]  L. Cronbach Five perspectives on the validity argument. , 1988 .

[27]  S. Whitely Construct validity: Construct representation versus nomothetic span. , 1983 .

[28]  K. A. Heller,et al.  Placing children in special education : a strategy for equity , 1982 .