Empirical versus subjective procedures for identifying gender differences in science test items

When judgmental and statistical procedures are both used to identify potentially gender-biased items in a test, to what extent do the results agree? In this study, both procedures were used to evaluate the items in a statewide, 78-item, multiple-choice test of science knowledge. Only one item was flagged by the sensitivity reviewers as being potentially biased, but this item was not flagged by the statistical procedure. None of the nine items flagged by the Mantel-Haenszel procedure were flagged by the sensitivity reviewers. Eight of the nine statistically flagged items were differentially easier for males. Four of these eight measured the same category of objectives. The authors conclude that both judgmental and statistical procedures provide useful information and that both should be used in test construction. They caution readers that content-validity issues need to be addressed when making decisions based on the results of either procedure.

[1]  Irvin J. Lehmann,et al.  Measurement and evaluation in education and psychology , 1973 .

[2]  P. Moss,et al.  Bias in test use. , 1989 .

[3]  G. Erickson,et al.  Females and Science Achievement: Evidence, Explanations, and Implications. , 1984 .

[4]  Abigail M. Harris,et al.  Gender differences in national assessment of educational progress science items: What does i don't know really mean? , 1987 .

[5]  W. H. Angoff,et al.  ITEM‐RACE INTERACTION ON A TEST OF SCHOLASTIC APTITUDE1 , 1973 .

[6]  N. Dorans Two New Approaches to Assessing Differential Item Functioning: Standardization and the Mantel--Haenszel Method , 1989 .

[7]  W W Hauck,et al.  A comparative study of conditional maximum likelihood estimation of a common odds ratio. , 1984, Biometrics.

[8]  Howard Wainer,et al.  Use of item response theory in the study of group differences in trace lines. , 1988 .

[9]  M. Linn,et al.  Gender, Mathematics, and Science , 1989 .

[10]  Comparison of Procedures for Detecting test-Item Bias with both Internal and External Ability Criteria , 1981 .

[11]  K. Ercikan,et al.  Analysis of Differential Item Functioning in the NAEP History Assessment , 1988 .

[12]  Screening for Potentially Biased Items in Testing Programs. , 1989 .

[13]  David M. Williams,et al.  VALIDITY OF APPROXIMATION TECHNIQUES FOR DETECTING ITEM BIAS , 1985 .

[14]  R. Hambleton,et al.  Detecting potentially biased test items : Comparison of IRT area and Mantel-Haenszel methods , 1989 .

[15]  Dorothy T. Thayer,et al.  Differential Item Performance and the Mantel-Haenszel Procedure. , 1986 .

[16]  W. Haenszel,et al.  Statistical aspects of the analysis of data from retrospective studies of disease. , 1959, Journal of the National Cancer Institute.

[17]  Sex‐related differences in science achievement: a possible testing artefact , 1989 .

[18]  J. Scheuneman,et al.  A Consumer's Guide to Statistics for Identifying Differential Item Functioning , 1989 .

[19]  Leonard S. Cahen,et al.  Educational Testing Service , 1970 .

[20]  Marcia C. Linn,et al.  Establishing a research base for science education: Challenges, trends, and recommendations , 1986 .

[21]  N. E. Gronlund Measurement and evaluation in teaching , 1965 .

[22]  T. Cleary,et al.  An Investigation of Item Bias , 1968 .