Checking Equity: Why Differential Item Functioning Analysis Should Be a Routine Part of Developing Conceptual Assessments

A test may be unfair when students with the same knowledge but from different demographic groups perform differently on its items. Identifying and addressing this differential item functioning (DIF) helps ensure a fair, unbiased test. This Research Methods paper will help biology education researchers identify DIF items in their assessments.

[1]  Randall D. Penfield,et al.  A Comparison of the Logistic Regression and Contingency Table Methods for Simultaneous Detection of Uniform and Nonuniform DIF , 2009 .

[2]  Dorothy T. Thayer,et al.  Differential Item Performance and the Mantel-Haenszel Procedure. , 1986 .

[3]  Steven M. Downing,et al.  Handbook of test development , 2006 .

[4]  R. Nehm,et al.  Evaluating Instrument Quality in Science Education: Rasch‐based analyses of a Nature of Science test , 2011 .

[5]  William Stout,et al.  A model-based standardization approach that separates true bias/DIF from group ability differences and detects test bias/DTF as well as item bias/DIF , 1993 .

[6]  Laura S. Hamilton,et al.  Detecting Gender-Based Differential Item Functioning on a Constructed- Response Science Test , 1999 .

[7]  Patricia Martinkova,et al.  Detection of Differential Item Functioning with Nonlinear Regression: A Non‐IRT Approach Accounting for Guessing , 2017 .

[8]  M. O’Connor,et al.  “I never thought of it as freezing”: How students answer questions on large‐scale science tests and what they know about science , 2012 .

[9]  T. DiPrete,et al.  The High School Environment and the Gender Gap in Science and Engineering , 2014, Sociology of education.

[10]  Sarah L. Eddy,et al.  Cognitive Difficulty and Format of Exams Predicts Gender and Socioeconomic Gaps in Exam Performance of Students in Introductory Biology Courses , 2015, CBE life sciences education.

[11]  G. Tutz,et al.  Detection of Uniform and Nonuniform Differential Item Functioning by Item-Focused Trees , 2015, 1511.07178.

[12]  D. Pearl,et al.  Examining Gender Differences in Written Assessment Tasks in Biology: A Case Study of Evolutionary Explanations , 2016, CBE life sciences education.

[13]  J. Zvárová,et al.  The Prediction and Probability for Successful Completion in Medical Study Based on Tests and Pre-admission Grades , 2012 .

[14]  Terry A. Ackerman A Didactic Explanation of Item Bias, Item Impact, and Item Validity from a Multidimensional Perspective , 1992 .

[15]  Howard T. Everson,et al.  Methodology Review: Statistical Approaches for Assessing Measurement Bias , 1993 .

[16]  Richard R. Tolman,et al.  Empirical versus subjective procedures for identifying gender differences in science test items , 1993 .

[17]  B. Zumbo A Handbook on the Theory and Methods of Differential Item Functioning (DIF) LOGISTIC REGRESSION MODELING AS A UNITARY FRAMEWORK FOR BINARY AND LIKERT-TYPE (ORDINAL) ITEM SCORES , 1999 .

[18]  Brian E. Clauser,et al.  Using Statistical Procedures to Identify Differentially Functioning Test Items , 2005 .

[19]  Effect of Multiple Testing Adjustment in Differential Item Functioning Detection , 2013 .

[20]  William J. Boone,et al.  Rasch Analysis for Instrument Development: Why, When, and How? , 2016, CBE life sciences education.

[21]  M. J. Allen Introduction to Measurement Theory , 1979 .

[22]  C. Walker What’s the DIF? Why Differential Item Functioning Analyses Are an Important Part of Instrument Development and Validation , 2011 .

[23]  José F. Domene,et al.  Application of Think Aloud Protocols for Examining and Confirming Sources of Differential Item Functioning Identified by Expert Reviews. , 2010 .

[24]  Ryan D. Sweeder,et al.  Analysis of Student Performance in Large-Enrollment Life Science Courses , 2012, CBE life sciences education.

[25]  Todd D. Reeves,et al.  Contemporary Test Validity in Theory and Practice: A Primer for Discipline-Based Education Researchers , 2016, CBE life sciences education.

[26]  Allen E. Doolittle Understanding Differential Item Performance as a Consequence of Gender Differences in Academic Background. , 1985 .

[27]  H. Swaminathan,et al.  Detecting Differential Item Functioning Using Logistic Regression Procedures , 1990 .

[28]  William Stout,et al.  A Multidimensionality-Based DIF Analysis Paradigm , 1996 .

[29]  Jonathan P. Weeks,et al.  SARA Reading Components Tests, RISE Forms: Technical Adequacy and Test Design, 2nd Edition. Research Report. ETS RR-15-32. , 2015 .

[30]  Alan Agresti,et al.  Categorical Data Analysis , 1991, International Encyclopedia of Statistical Science.

[31]  Michael J. Zieky,et al.  Practical questions in the use of DIF statistics in test development. , 1993 .

[32]  Nambury S. Raju,et al.  Determining the Significance of Estimated Signed and Unsigned Areas Between Two Item Response Functions , 1990 .

[33]  Randall D. Penfield,et al.  Test‐based accountability: Potential benefits and pitfalls of science assessment with student diversity , 2010 .

[34]  L. Shepard,et al.  Methods for Identifying Biased Test Items , 1994 .

[35]  John E. Dennis,et al.  An Adaptive Nonlinear Least-Squares Algorithm , 1977, TOMS.

[36]  Kristen L. Murphy,et al.  Identifying Differential Performance in General Chemistry: Differential Item Functioning Analysis of ACS General Chemistry Trial Tests , 2013 .

[37]  John Sabatini,et al.  Key Practices in the English Language Arts (ELA): Linking Learning Theory, Assessment, and Instruction , 2015 .

[38]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[39]  Mike U. Smith,et al.  The GAENE—Generalized Acceptance of EvolutioN Evaluation: Development of a new measure of evolution acceptance , 2016 .

[40]  W. Haenszel,et al.  Statistical aspects of the analysis of data from retrospective studies of disease. , 1959, Journal of the National Cancer Institute.

[41]  Marcelle A. Siegel Striving for Equitable Classroom Assessments for Linguistic Minorities: Strategies for and Effects of Revising Life Science Items. , 2007 .

[42]  William L. Romine,et al.  Multilevel Assessment of Middle School Students’ Interest in the Health Sciences: Development and Validation of a New Measurement Tool , 2016, CBE life sciences education.

[43]  Educational Evaluation Standards for Educational and Psychological Testing , 1999 .

[44]  F. Lord Applications of Item Response Theory To Practical Testing Problems , 1980 .

[45]  Jonathan P. Weeks,et al.  SARA Reading Components Tests, RISE Forms: Technical Adequacy and Test Design, 3rd Edition , 2019, ETS Research Report Series.

[46]  Georg Rasch,et al.  Probabilistic Models for Some Intelligence and Attainment Tests , 1981, The SAGE Encyclopedia of Research Design.

[47]  H. Swaminathan,et al.  Identification of Items that Show Nonuniform DIF , 1996 .

[48]  H. Jane Rogers,et al.  Differential Item Functioning , 2005 .

[49]  David S. Moore,et al.  The Basic Practice of Statistics [With CDROM] , 1999 .

[50]  Paul S. Steif,et al.  A Statics Concept Inventory: Development and Psychometric Analysis , 2005 .

[51]  J. Michael,et al.  Development and Validation of the Homeostasis Concept Inventory , 2017, CBE life sciences education.

[52]  S. Natasha Beretvas,et al.  An empirical investigation demonstrating the multidimensional DIF paradigm: A cognitive explanation for DIF , 2001 .

[53]  Neal M. Kingston,et al.  AN EXPLORATORY STUDY OF THE APPLICABILITY OF ITEM RESPONSE THEORY METHODS TO THE GRADUATE MANAGEMENT ADMISSION TEST1 , 1985 .

[54]  Screening for Potentially Biased Items in Testing Programs. , 1989 .

[55]  Francis Tuerlinckx,et al.  Detection of Differential Item Functioning Using the Lasso Approach , 2015 .

[56]  Wendy K. Adams,et al.  Development and Validation of Instruments to Measure Learning of Expert‐Like Thinking , 2011 .

[57]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[58]  G. Birol,et al.  Development of the Statistical Reasoning in Biology Concept Inventory (SRBCI) , 2016, CBE life sciences education.

[59]  Nambury S. Raju,et al.  The area between two item characteristic curves , 1988 .

[60]  Lei Wang,et al.  Exploring plausible causes of differential item functioning in the PISA science assessment: language, curriculum or culture , 2016 .

[61]  April L. Zenisky,et al.  DIF Detection and Interpretation in Large-Scale Science Assessments: Informing Item Writing Practices , 2004 .

[62]  Andrew Gelman,et al.  Data Analysis Using Regression and Multilevel/Hierarchical Models , 2006 .

[63]  Bruno D. Zumbo,et al.  Three Generations of DIF Analyses: Considering Where It Has Been, Where It Is Now, and Where It Is Going , 2007 .

[64]  H. Wainer,et al.  Are Tests Comprising Both Multiple‐Choice and Free‐Response Items Necessarily Less Unidimensional Than Multiple‐Choice Tests?An Analysis of Two Tests , 1994 .

[65]  Ryan D. Sweeder,et al.  Gender performance differences in biochemistry , 2010, Biochemistry and molecular biology education : a bimonthly publication of the International Union of Biochemistry and Molecular Biology.

[66]  Mark Wilson,et al.  Gender Differences in Large-Scale Math Assessments: PISA Trend 2000 and 2003 , 2009 .

[67]  P. Boeck,et al.  A general framework and an R package for the detection of dichotomous differential item functioning , 2010, Behavior research methods.