Setting Standards to a Scientific Literacy Test for Adults Using the Item-Descriptor (ID) Matching Method

Common standard setting methods such as the Angoff or the Bookmark method require panellists to imagine minimally competent persons or to estimate response probabilities, in order to define cut scores. Imagining these persons and how they would perform is criticised as cognitively demanding. These already challenging judgemental tasks become even more difficult, when experts have to deal with very heterogeneous or insufficiently studied populations, such as adults. The Item-Descriptor (ID) Matching method can reduce the arbitrariness of such subjective evaluations by focusing on rather objective judgements about the content of tests. At our standard setting workshop, seven experts had to match the item demands of 22 items of a scientific literacy test for adults with abilities described by performance level descriptions (PLDs) of the two proficiency levels Basic and Advanced. Since the ID Matching method has hardly been used in European standard settings, the method has not been evaluated comprehensively. In order to evaluate the appropriateness and correct interpretation of cut scores, information about the validity of standard setting methods is essential. In this chapter, we aim to provide procedural and internal evidence for the use and interpretation of the derived cut scores and PLDs using the ID Matching method. With regard to procedural validity, we report high and consensual agreement of the experts regarding explicitness, practicability, implementation, and feedback, which we assessed by detailed questionnaires. The inter-rater reliability for the panellists’ classification of items was low, but increased during subsequent rounds (κ = .38 to κ = .63). The values are consistent with findings of earlier studies which support internal validity. We argue that the cut scores and PLDs derived from the application of the ID Matching method are appropriate to categorise adults as scientifically illiterate, literate, and advanced literate.

[1]  Simon P. Tiffin-Richards,et al.  Validity issues in standard-setting studies , 2009 .

[2]  S. Anderson,et al.  Item Response Theory (IRT) , 2015 .

[3]  Ronald A. Berk,et al.  A Consumer’s Guide to Setting Performance Standards on Criterion-Referenced Tests , 1986 .

[4]  R. Hambleton,et al.  Handbook of Modern Item Response Theory , 1997 .

[5]  Mary J. Pitoniak,et al.  Standard setting methods for complex licensure examinations , 2003 .

[6]  Inga Hahn,et al.  NEPS technical report for science: Scaling results of starting cohort 4 in 11th grade , 2016 .

[7]  Sabine Weinert,et al.  Assessing competencies across the lifespan within the German National Educational Panel Study (NEPS) - Editorial , 2013 .

[8]  R. Hambleton Setting Performance Standards on Educational Assessments and Criteria for Evaluating the Process , 2013 .

[9]  Stephen G. Sireci,et al.  The Bookmark Standard‐Setting Method: A Literature Review , 2006 .

[10]  Barbara S. Plake,et al.  A Comparison of Angoff and Bookmark Standard Setting Methods , 2002 .

[11]  Claus H. Carstensen,et al.  Scaling of competence tests in the National Educational Panel Study - many questions, some answers, and further challenges , 2013 .

[12]  Steve Ferrara,et al.  Matching the Judgmental Task with Standard Setting Panelist Expertise: The Item-Descriptor (ID) Matching Method , 2008 .

[13]  M. Kane Validating the Performance Standards Associated With Passing Scores , 1994 .

[14]  Stephen G. Sireci,et al.  On Validity Theory and Test Validation , 2007 .

[15]  STANDARD ERROR OF AN EQUATING BY ITEM RESPONSE THEORY , 1981 .

[16]  Gregory J. Cizek,et al.  Setting performance standards : foundations, methods, and innovations , 2012 .

[17]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[18]  J. Fleiss,et al.  The measurement of interrater agreement , 2004 .

[19]  Michael T. Kane,et al.  Validating score interpretations and uses , 2012 .

[20]  Inger Marie Dalehefte,et al.  Assessing scientific literacy over the lifespan - a description of the NEPS science framework and the test development , 2013 .