Gauging Item Alignment through Online Systems While Controlling for Rater Effects.

The alignment of test items to content standards is critical to the validity of decisions made from standards-based tests. Generally, alignment is determined based on judgments made by a panel of content experts with either ratings averaged or via a consensus reached through discussion. When the pool of items to be reviewed is large, or the content-matter experts are broadly distributed geographically, panel methods present significant challenges. This article illustrates the use of an online methodology for gauging item alignment that does not require that raters convene in person, reduces the overall cost of the study, increases time flexibility, and offers an efficient means for reviewing large item banks. Latent trait methods are applied to the data to control for between-rater severity, evaluate intrarater consistency, and provide item-level diagnostic statistics. Use of this methodology is illustrated with a large pool (1,345) of interim-formative mathematics test items. Implications for the field and limitations of this approach are discussed.

[1]  Rupert Brown,et al.  Group Processes: Dynamics Within and Between Groups , 1988 .

[2]  R. Bond,et al.  Group Size and Conformity , 2005 .

[3]  Edward W. Wolfe,et al.  STRENGTHENING THE TIES THAT BIND: IMPROVING THE LINKING NETWORK IN SPARSELY CONNECTED RATING DESIGNS , 2000 .

[4]  Edward W. Wolfe,et al.  Identifying Rater Effects Using Latent Trait Models , 2004 .

[5]  Norman L. Webb,et al.  Alignment of Mathematics State-Level Standards and Assessments: The Role of Reviewer Agreement. , 2006 .

[6]  Noreen M. Webb,et al.  Measurement Issues in the Alignment of Standards and Assessments: A Case Study. , 2005 .

[7]  Stephen G. Sireci,et al.  Evaluating Alignment Between Curriculum, Assessment, and Instruction , 2009 .

[8]  La Marca,et al.  Alignment of Standards and Assessments as an Accountability Criterion. ERIC Digest. , 2001 .

[9]  C. Velozo,et al.  A Comparison of the Separation Ratio and Coefficient &agr; in the Creation of Minimum Item Sets , 2004, Medical care.

[10]  G. Karabatsos,et al.  Hierarchical Generalized Linear Models for the Analysis of Judge Ratings. , 2009 .

[11]  S. Asch Studies of independence and conformity: I. A minority of one against a unanimous majority. , 1956 .

[12]  Norman L. Webb,et al.  Issues Related to Judging the Alignment of Curriculum Standards and Assessments , 2007 .

[13]  Margaret Wu,et al.  Properties of Rasch residual fit statistics. , 2013, Journal of applied measurement.

[14]  George Engelhard,et al.  Examining Rater Errors in the Assessment of Written Composition With a Many-Faceted Rasch Model , 1994 .

[15]  N. Webb,et al.  Measurement Issues in the Alignment of Standards and Assessments , 2007 .

[16]  Margaret A. Jorgensen,et al.  Alignment in Educational Assessment , 2004 .

[17]  Eli P. Cox,et al.  The optimal number of response alternatives for a scale: A review. , 1980 .

[18]  S. Komorita,et al.  ATTITUDE CONTENT, INTENSITY, AND THE NEUTRAL POINT ON A LIKERT SCALE. , 1963, The Journal of social psychology.

[19]  Lauren B. Resnick,et al.  Benchmarking and Alignment of Standards and Testing. CSE Technical Report. , 2002 .

[20]  D. Andrich A rating formulation for ordered response categories , 1978 .

[21]  Chad W. Buckendahl,et al.  Aligning Tests with States' Content Standards: Methods and Issues , 2005 .

[22]  Christof van Nimwegen,et al.  Do people say what they think: social conformity behavior in varying degrees of online social presence , 2010, NordiCHI.

[23]  M. Kocher,et al.  The Decision Maker Matters: Individual Versus Group Behaviour in Experimental Beauty-Contest Games , 2005 .

[24]  G. Masters A rasch model for partial credit scoring , 1982 .

[25]  J. Linacre,et al.  Many-facet Rasch measurement , 1994 .

[26]  Jacob Jacoby,et al.  Is there an optimal number of alternatives for Likert-scale items? Effects of testing time and scale properties. , 1972 .

[27]  R. M. Smith,et al.  Using item mean squares to evaluate fit to the Rasch model. , 1998, Journal of outcome measurement.

[28]  Andrew C. Porter,et al.  Measuring the Content of Instruction: Uses in Research and Practice , 2002 .