MANAGING THE INFLUENCE OF DIF FROM BIG ITEMS: THE 1988 ADVANCED PLACEMENT HISTORY TEST AS AN EXAMPLE

Building tests out of items that individually take a substantial amount of examinee time brings with it a number of problems. One major problem is that it is often too difficult and too expensive to extensively pretest such large items. Thus, the sorts of screening for flaws that are pro forma for multiple-choice items are not often done for large items. In addition, because there are so few large items on an operational test, not counting an entire item that is found to be flawed in an operational administration may be tantamount to aborting that administration. In this article, we examine the efficacy of the alternative of continuous item weighting. This alternative is illustrated on data from the 1988 administration of the College Board's Advanced Placement History Test.

[1]  M. R. Novick,et al.  Statistical Theories of Mental Test Scores. , 1971 .

[2]  F. Samejima Estimation of latent ability using a response pattern of graded scores , 1968 .

[3]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data , 1988 .

[4]  E. Muraki,et al.  Full-Information Item Factor Analysis , 1988 .

[5]  Robert Lukhele,et al.  On the Relative Value of Multiple-Choice, Constructed Response, and Examinee-Selected Items on Two Achievement Tests. Program Statistics Research Technical Report No. 93-28. , 1993 .

[6]  H. Wainer,et al.  Differential Testlet Functioning: Definitions and Detection , 1991 .

[7]  David Thissen,et al.  Trace Lines for Testlets: A Use of Multiple-Categorical-Response Models. , 1989 .

[8]  R. D. Bock,et al.  Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm , 1981 .

[9]  Howard Wainer,et al.  Precision and Differential Item Functioning on a Testlet-Based Test: The 1991 Law School Admissions Test as an Example , 1995 .

[10]  Randy Elliot Bennett,et al.  Equivalence of Free-Response and Multiple-Choice Items , 1991 .

[11]  H. Wainer,et al.  Are Tests Comprising Both Multiple‐Choice and Free‐Response Items Necessarily Less Unidimensional Than Multiple‐Choice Tests?An Analysis of Two Tests , 1994 .

[12]  Stephen G. Sireci,et al.  ON THE RELIABILITY OF TESTLET‐BASED TESTS , 1991 .

[13]  R. Darrell Bock,et al.  Estimating item parameters and latent ability when responses are scored in two or more nominal categories , 1972 .