ETS CONTRIBUTIONS TO THE QUANTITATIVE ASSESSMENT OF ITEM, TEST, AND SCORE FAIRNESS

ETS has been a leader in the development of quantitative procedures for fairness assessment, and its efforts are reviewed in this chapter. The first section deals with differential prediction and differential validity procedures that examine whether test scores predict a criterion, such as performance in college, across different subgroups in a similar manner. The second section, constituting the bulk of the chapter, focuses on item-level fairness, or differential item functioning. In the third section, research is considered pertaining to whether tests built to the same set of specifications produce scores that are related in the same way across different gender and ethnic groups. Limitations of the approaches are discussed in the final section.

[1]  Turnbull Ww Socio-economic status and predictive test scores. , 1951 .

[2]  Socio-economic status and predictive test scores. , 1951, Canadian journal of psychology.

[3]  W. Haenszel,et al.  Statistical aspects of the analysis of data from retrospective studies of disease. , 1959, Journal of the National Cancer Institute.

[4]  Nathan Mantel,et al.  Chi-square tests with one degree of freedom , 1963 .

[5]  Joel T. Campbell TESTING OF CULTURALLY DIFFERENT GROUPS , 1964 .

[6]  AN INVESTIGATION OF ITEM BIAS1 , 1966 .

[7]  T. Cleary TEST BIAS: PREDICTION OF GRADES OF NEGRO AND WHITE STUDENTS IN INTEGRATED COLLEGES , 1968 .

[8]  T. Cleary,et al.  An Investigation of Item Bias , 1968 .

[9]  W. H. Angoff,et al.  ITEM-RACE INTERACTION ON A TEST OF SCHOLASTIC APTITUDE , 1971 .

[10]  R. L. Thorndike CONCEPTS OF CULTURE-FAIRNESS , 1971 .

[11]  Robert L. Linn,et al.  Considerations for studies of test bias. , 1971 .

[12]  William H. Angoff A Technique for the Investigation of Cultural Differences. , 1972 .

[13]  Some Implications of the Griggs Decision for Test Makers and Users. , 1972 .

[14]  W. H. Angoff,et al.  ITEM‐RACE INTERACTION ON A TEST OF SCHOLASTIC APTITUDE1 , 1973 .

[15]  N. Cole BIAS IN SELECTION , 1973 .

[16]  The Evaluation of Differences in Test Performance of Two or More Groups , 1974 .

[17]  R. Linn Test Bias and the Prediction of Grades in Law School. , 1975 .

[18]  TEST FAIRNESS: A COMMENT ON FAIRNESS IN STATISTICAL ANALYSIS , 1975 .

[19]  R. Linn IN SEARCH OF FAIR SELECTION PROCEDURES , 1976 .

[20]  M. R. Novick,et al.  AN EVALUATION OF SOME MODELS FOR CULTURE-FAIR SELECTION , 1976 .

[21]  J. Scheuneman A METHOD OF ASSESSING BIAS IN TEST ITEMS , 1979 .

[22]  F. Lord Applications of Item Response Theory To Practical Testing Problems , 1980 .

[23]  Georg Rasch,et al.  Probabilistic Models for Some Intelligence and Attainment Tests , 1981, The SAGE Encyclopedia of Research Design.

[24]  Nonresponse in Declared Ethnicity and the Identification of Differentially Functioning Items. Program Statistics Research, Technical Report No. 89-89. , 1982 .

[25]  Lawrence J. Stricker,et al.  Identifying Test Items That Perform Differentially in Population Subgroups: A Partial Correlation Index , 1982 .

[26]  Neil J. Dorans,et al.  ASSESSING UNEXPECTED DIFFERENTIAL ITEM PERFORMANCE OF FEMALE CANDIDATES ON SAT AND TSWE FORMS ADMINISTERED IN DECEMBER 1977: AN APPLICATION OF THE STANDARDIZATION APPROACH1 , 1983 .

[27]  Paul W. Holland,et al.  An Alternate Definition of the ETS Delta Scale of Item Difficulty. Program Statistics Research. , 1985 .

[28]  Dorothy T. Thayer,et al.  DIFFERENTIAL ITEM FUNCTIONING AND THE MANTEL‐HAENSZEL PROCEDURE , 1986 .

[29]  Neil J. Dorans,et al.  Demonstrating the utility of the standardization approach to assessing unexpected differential item performance on the Scholastic Aptitude Test. , 1986 .

[30]  Dorothy T. Thayer,et al.  Differential Item Performance and the Mantel-Haenszel Procedure. , 1986 .

[31]  W. R. Cowell,et al.  AN EXAMINATION OF THE ASSUMPTION THAT THE EQUATING OF PARALLEL FORMS IS POPULATION‐INDEPENDENT , 1985 .

[32]  ANALYSIS OF DIFFERENTIAL ITEM FUNCTIONING IN THE NAEP HISTORY ASSESSMENT , 1988 .

[33]  Linda L. Cook,et al.  A Comparative Study of the Effects of Recency of Instruction on the Stability of IRT and Conventional Item Parameter Estimates. , 1988 .

[34]  Neil J. Dorans,et al.  THE STANDARDIZATION APPROACH TO ASSESSING DIFFERENTIAL SPEEDEDNESS , 1988 .

[35]  N. Dorans Two New Approaches to Assessing Differential Item Functioning: Standardization and the Mantel--Haenszel Method , 1989 .

[36]  R. Zwick When Do Item Response Function and Mantel-Haenszel Definitions of Differential Item Functioning Coincide? , 1990 .

[37]  N. Dorans,et al.  CONSTRUCTED RESPONSE AND DIFFERENTIAL ITEM FUNCTIONING: A PRAGMATIC APPROACH1 , 1991 .

[38]  H. Wainer,et al.  Differential Testlet Functioning: Definitions and Detection , 1991 .

[39]  N. Dorans,et al.  The Standardization Approach to Assessing Comprehensive Differential Item Functioning , 1992 .

[40]  P. Holland,et al.  DIF DETECTION AND DESCRIPTION: MANTEL‐HAENSZEL AND STANDARDIZATION1,2 , 1992 .

[41]  Roger E. Millsap,et al.  On the misuse of manifest variables in the detection of measurement bias , 1992 .

[42]  P. Pashley GRAPHICAL IRT-BASED DIF ANALYSES , 1992 .

[43]  M. Pomplun,et al.  AN INITIAL EVALUATION OF THE USE OF BIVARIATE MATCHING IN DIF ANALYSES FOR FORMULA SCORED TESTS12 , 1992 .

[44]  P. Holland,et al.  EVALUATING HYPOTHESES ABOUT DIFFERENTIAL ITEM FUNCTIONING1,2 , 1992 .

[45]  Michael J. Zieky,et al.  Practical questions in the use of DIF statistics in test development. , 1993 .

[46]  Howard Wainer,et al.  Detection of differential item functioning using the parameters of item response models. , 1993 .

[47]  W. H. Angoff,et al.  Perspectives on differential item functioning methodology. , 1993 .

[48]  Bayesian methods for the analysis of variance. , 1993 .

[49]  Nancy L. Allen,et al.  Thin Versus Thick Matching in the Mantel-Haenszel Procedure for Detecting DIF , 1993 .

[50]  R. Zwick,et al.  Assessment of Differential Item Functioning for Performance Tasks , 1993 .

[51]  Kathleen A. O'Neill,et al.  Item and test characteristics that are associated with differential item functioning. , 1993 .

[52]  Dorothy T. Thayer,et al.  A SIMULATION STUDY OF METHODS FOR ASSESSING DIFFERENTIAL ITEM FUNCTIONING IN COMPUTER‐ADAPTIVE TESTS , 1993 .

[53]  William Stout,et al.  A model-based standardization approach that separates true bias/DIF from group ability differences and detects test bias/DTF as well as item bias/DIF , 1993 .

[54]  Rebecca Zwick The Effect of the Probability of Correct Response on the Variability of Measures of Differential Item Functioning. Program Statistics Research Technical Report No. 94-4. , 1994 .

[55]  H. Wainer,et al.  Differential Item Functioning. , 1994 .

[56]  Rebecca Zwick,et al.  A Simulation Study of Methods for Assessing Differential Item Functioning in Computerized Adaptive Tests , 1994 .

[57]  Maria T. Potenza,et al.  Equity Assessment for Polytomously Scored Items: A Taxonomy of Procedures for Assessing Differential Item Functioning. Research Report RR-94-49. , 1994 .

[58]  Maria T. Potenza,et al.  EQUITY ASSESSMENT FOR POLYTOMOUSLY SCORED ITEMS: A TAXONOMY OF PROCEDURES FOR ASSESSING DIFFERENTIAL ITEM FUNCTIONING1 , 1994 .

[59]  Neil J. Dorans,et al.  DIF Assessment for Polytomously Scored Items: A Framework for Classification and Evaluation , 1995 .

[60]  J. Ramsay,et al.  SMOOTHED STANDARDIZATION ASSESSMENT OF TESTLET LEVEL DIF ON A MATH FREE-RESPONSE ITEM TYPE1 , 1995 .

[61]  Effect of Rasch Calibration on Ability and DIF Estimation in Computer-Adaptive Tests , 1995 .

[62]  Nancy L. Allen,et al.  Application of the Mantel-Haenszel Procedure to Complex Samples of Items. , 1995 .

[63]  Rebecca Zwick,et al.  Evaluating the Magnitude of Differential Item Functioning in Polytomous Items , 1996 .

[64]  Nancy L. Allen,et al.  Applying the Mantel-Haenszel Procedure to Complex Samples of Items , 1996 .

[65]  Hua-Hua Chang,et al.  Detecting DIF for Polytomously Scored Items: An Adaptation of the SIBTEST Procedure , 1995 .

[66]  Dorothy T. Thayer,et al.  AN INVESTIGATION OF THE VALIDITY OF AN EMPIRICAL BAYES APPROACH TO MANTEL‐HAENSZEL DIF ANALYSIS , 1997 .

[67]  Dorothy T. Thayer,et al.  Descriptive and Inferential Procedures for Assessing Differential Item Functioning in Polytomous Items. , 1997 .

[68]  Rebecca Zwick,et al.  An Empirical Bayes Approach to Mantel-Haenszel DIF Analysis. , 1999 .

[69]  Rebecca Zwick,et al.  Using Loss Functions for DIF Detection: An Empirical Bayes Approach , 2000 .

[70]  L. Thelma Meeting of the National Council on Measurement in Education , 2000 .

[71]  P. Holland,et al.  Population Invariance and the Equatability of Tests: Basic Theory and The Linear Case , 2000 .

[72]  Dorothy T. Thayer,et al.  Application of an Empirical Bayes Enhancement of Mantel-Haenszel Differential Item Functioning Analysis to a Computerized Adaptive Test , 2002 .

[73]  Dorothy T. Thayer,et al.  POPULATION INVARIANCE OF SCORE LINKING: THEORY AND APPLICATIONS TO ADVANCED PLACEMENT PROGRAM® EXAMINATIONS , 2003 .

[74]  Sensitivity of Linkings between AP Multiple-Choice Scores and Composite Scores to Geographical Region: An Illustration of Checking for Population Invariance. , 2004 .

[75]  N. Dorans Using Subpopulation Invariance to Assess Test Score Equity , 2004 .

[76]  N. Dorans,et al.  USING DIF DISSECTION METHOD TO ASSESS EFFECTS OF ITEM DELETION , 2005 .

[77]  Edward Kulick,et al.  Differential Item Functioning on the Mini-Mental State Examination: An Application of the Mantel-Haenszel and Standardization Procedures , 2006, Medical care.

[78]  An Application of Score Equity Assessment: Invariance of Linkage of New SAT® to Old SAT Across Gender Groups , 2006 .

[79]  Michael E. Walker,et al.  Score Linking Issues Related to Test Content Changes , 2007 .

[80]  Linda L. Cook Practical Problems in Equating Test Scores: A Practitioner’s Perspective , 2007 .

[81]  Nancy S. Petersen Equating: Best Practices and Challenges to Best Practices , 2007 .

[82]  N. Dorans,et al.  SMALL‐SAMPLE DIF ESTIMATION USING LOG‐LINEAR SMOOTHING: A SIBTEST APPLICATION , 2007 .

[83]  P. Holland,et al.  Linking and aligning scores and scales , 2007 .

[84]  Invariance of Score Linkings Across Gender Groups for Forms of a Testlet-Based College-Level Examination Program Examination , 2008 .

[85]  Investigating the Population Sensitivity Assumption of Item Response Theory True-Score Equating Across Two Subgroups of Examinees and Two Test Formats , 2008 .

[86]  DIF DETECTION WITH SMALL SAMPLES: APPLYING SMOOTHING TECHNIQUES TO FREQUENCY DISTRIBUTIONS IN THE MANTEL‐HAENSZEL PROCEDURE , 2008 .

[87]  N. Dorans,et al.  A REVIEW OF RECENT DEVELOPMENTS IN DIFFERENTIAL ITEM FUNCTIONING , 2008 .

[88]  Anchor Test Type and Population Invariance: An Exploration Across Subpopulations and Test Administrations , 2008 .

[89]  N. Dorans,et al.  SCORE EQUITY ASSESSMENT: DEVELOPMENT OF A PROTOTYPE ANALYSIS USING SAT® MATHEMATICS TEST DATA ACROSS SEVERAL ADMINISTRATIONS , 2009 .

[90]  N. Dorans,et al.  Using Log-Linear Smoothing to Improve Small-Sample DIF Estimation. , 2009 .

[91]  S. Sinharay,et al.  First Language of Examinees and Its Relationship to Equating. Research Report. ETS RR-09-05. , 2009 .

[92]  Edwin O. Blew,et al.  Using Past Data to Enhance Small Sample DIF Estimation: A Bayesian Approach , 2009 .

[93]  First Language of Examinees and Its Relationship to Differential Item Functioning. Research Report. ETS RR-09-11. , 2009 .

[94]  Score Equity Assessment:Development of a Prototype Analysis Using SAT[R] Mathematics Test Data Across Several Administrations. Research Report. ETS RR-09-08. , 2009 .

[95]  N. Dorans,et al.  THE VALUE OF THE STUDIED ITEM IN THE MATCHING CRITERION IN DIFFERENTIAL ITEM FUNCTIONING (DIF) ANALYSIS , 2010 .

[96]  N. Dorans Holland’s Advice for the Fourth Generation of Test Theory: Blood Tests Can Be Contests , 2011 .

[97]  The Origins of Procedures for Using Differential Item Functioning Statistics at Educational Testing Service , 2011 .

[98]  N. Dorans The Contestant Perspective on Taking Tests: Emanations From the Statue Within , 2012 .

[99]  Steven P. Isham,et al.  Improving Mantel–Haenszel DIF Estimation Through Bayesian Updating , 2012 .

[100]  Rebecca Zwick,et al.  A Review of ETS Differential Item Functioning Assessment Procedures: Flagging Rules, Minimum Sample Size Requirements, and Criterion Refinement , 2012 .

[101]  A. Schmitt,et al.  EVALUATING HYPOTHESES ABOUT DIFFERENTIAL ITEM FUNCTIONING , 2012 .

[102]  Neil J. Dorans,et al.  Assessing a Critical Aspect of Construct Continuity When Test Specifications Change or Test Forms Deviate from Specifications , 2013 .

[103]  Melissa S. Yale,et al.  Differential Item Functioning , 2014 .