Precision and Differential Item Functioning on a Testlet-Based Test: The 1991 Law School Admissions Test as an Example

Two components of the Law School Admissions Test (LSAT)--reading comprehension and analytic reasoning-are not constructed of individual items that can function autonomously. Instead they each consist of four clusters of items, in which each cluster refers to a common stem. In reading comprehension the common part is a single passage; in analytic reasoning it is a common situation. Such interdependent clusters of items have come to be called testlets. It has been found that when a test is constructed of testlets, traditional treatments of individual items as independent entities tends to yield overly optimistic estimates of reliability. It has also been found that even though individual items may pass muster in terms of their differential performance within various subgroups, this may not be true once the items within a testlet are treated as a coherent unit. Findings can be divided into three categories: (a) overall performance of individual subgroups on the test, (b) the reliability of each section, and ...

[1]  K. Holzinger Note on the Use of Spearman's Prophecy Formula for Reliability. , 2022 .

[2]  H. Wainer,et al.  Differential Testlet Functioning: Definitions and Detection , 1991 .

[3]  R. Zwick When Do Item Response Function and Mantel-Haenszel Definitions of Differential Item Functioning Coincide? , 1990 .

[4]  L. Humphreys An analysis and evaluation of test and item bias in the prediction context. , 1986 .

[5]  Dorothy T. Thayer,et al.  Differential Item Performance and the Mantel-Haenszel Procedure. , 1986 .

[6]  H. Wainer,et al.  Toward a Psychometrics for Testlets , 1989 .

[7]  Mark D. Reckase,et al.  TECHNICAL GUIDELINES FOR ASSESSING COMPUTERIZED ADAPTIVE TESTS , 1984 .

[8]  Howard Wainer,et al.  Detection of differential item functioning using the parameters of item response models. , 1993 .

[9]  Stephen G. Sireci,et al.  ON THE RELIABILITY OF TESTLET‐BASED TESTS , 1991 .

[10]  David Thissen,et al.  Data analysis using item response theory. , 1988 .

[11]  William Stout,et al.  A model-based standardization approach that separates true bias/DIF from group ability differences and detects test bias/DTF as well as item bias/DIF , 1993 .

[12]  David Thissen,et al.  Repealing rules that no longer apply to psychological measurement. , 1993 .

[13]  Melvin R. Novick,et al.  Some latent train models and their use in inferring an examinee's ability , 1966 .

[14]  P. Holland On the sampling theory roundations of item response theory models , 1990 .

[15]  R. Darrell Bock,et al.  Fitting a response model forn dichotomously scored items , 1970 .

[16]  R. Darrell Bock,et al.  Estimating item parameters and latent ability when responses are scored in two or more nominal categories , 1972 .

[17]  H. Wainer,et al.  Differential Item Functioning. , 1994 .

[18]  Hua-Hua Chang,et al.  The unique correspondence of the item response function and item category response functions in polytomously scored item response models , 1994 .

[19]  E. Muraki,et al.  Full-Information Item Factor Analysis , 1988 .

[20]  Howard Wainer,et al.  Item Clusters and Computerized Adaptive Testing: A Case for Testlets , 1987 .

[21]  Lloyd G. Humphreys,et al.  The Primary Mental Ability , 1981 .

[22]  E. S. Pearson,et al.  ON THE USE AND INTERPRETATION OF CERTAIN TEST CRITERIA FOR PURPOSES OF STATISTICAL INFERENCE PART I , 1928 .

[23]  Howard Wainer,et al.  Use of item response theory in the study of group differences in trace lines. , 1988 .

[24]  David Thissen,et al.  Trace Lines for Testlets: A Use of Multiple-Categorical-Response Models. , 1989 .

[25]  F. Samejima Estimation of latent ability using a response pattern of graded scores , 1968 .

[26]  H. Gulliksen Theory of mental tests , 1952 .

[27]  M. R. Novick,et al.  Statistical Theories of Mental Test Scores. , 1971 .

[28]  R. Traub,et al.  NCME Instructional Module: Understanding Reliability. , 1991 .

[29]  C. Lewis,et al.  Using Bayesian Decision Theory to Design a Computerized Mastery Test , 1990 .

[30]  T. L. Kelley Note on the Reliability of a Test: A Reply to Dr. Crum's Criticism. , 2022 .

[31]  P. Holland,et al.  DIF DETECTION AND DESCRIPTION: MANTEL‐HAENSZEL AND STANDARDIZATION1,2 , 1992 .

[32]  Lloyd G. Humphreys,et al.  The organization of human abilities. , 1962 .

[33]  R. D. Bock,et al.  Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm , 1981 .

[34]  Julian C. Stanley,et al.  Differential Weighting: A Review of Methods and Empirical Studies1 , 1970 .

[35]  H. Swaminathan,et al.  Detecting Differential Item Functioning Using Logistic Regression Procedures , 1990 .

[36]  David Thissen,et al.  A response model for multiple choice items , 1984 .

[37]  M. Kendall,et al.  The advanced theory of statistics , 1945 .