DIF Statistical Inference and Detection without Knowing Anchoring Items

Establishing the invariance property of an instrument (e.g., a questionnaire or test) is a key step for establishing its measurement validity. Measurement invariance is typically assessed by differential item functioning (DIF) analysis, i.e., detecting DIF items whose response distribution depends not only on the latent trait measured by the instrument but also the group membership. DIF analysis is confounded by the group difference in the latent trait distributions. Many DIF analyses require to know several anchor items that are DIF-free and then draw inference on whether each of the rest is a DIF item, where the anchor items are used to calibrate the latent trait distributions. When no prior information on anchor items is available or some anchor items are misspecified, item purification methods and regularized estimation methods can be used. The former iteratively purifies the anchor set by a stepwise model selection procedure and the later selects the DIF-free items by a LASSO-type regularization approach. Unfortunately, unlike the methods based on a correctly specified anchor set, these methods are not guaranteed to provide valid statistical inference (e.g., confidence intervals and p-values). In this paper, we propose a new method for DIF analysis under a multiple indicators and multiple causes (MIMIC) model for DIF. This method adopts 1 ar X iv :2 11 0. 11 11 2v 1 [ st at .M E ] 2 1 O ct 2 02 1 a minimal L1 norm condition for identifying the latent trait distributions. It can not only accurately estimate the DIF effects of individual items without requiring prior knowledge about an anchor set, but also draw valid statistical inference, which yields accurate detection of DIF items. The inference results further allow us to control the type-I error for DIF detection. We conduct simulation studies to evaluate the performance of the proposed method and compare it with the anchor-set-based likelihood ratio test approach and the LASSO approach. The proposed method is further applied to analyzing the three personality scales of Eysenck personality questionnaire revised (EPQ-R).

[1]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[2]  Henry May,et al.  A Multilevel Bayesian Item Response Theory Method for Scaling Socioeconomic Status in International Studies of Education , 2006 .

[3]  Gregory L. Candell,et al.  An Iterative Procedure for Linking Metrics and Assessing Item Bias in Item Response Theory , 1988 .

[4]  F. Lord Applications of Item Response Theory To Practical Testing Problems , 1980 .

[5]  Howard Wainer,et al.  Use of item response theory in the study of group differences in trace lines. , 1988 .

[6]  Leigh Burstein,et al.  Instructionally Sensitive Psychometrics: Application of a New IRT‐Based Detection Technique to Mathematics Achievement Test Items , 1991 .

[7]  J. Steenkamp,et al.  Assessing Measurement Invariance in Cross-National Consumer Research , 1998 .

[8]  Li Cai,et al.  The Langer-Improved Wald Test for DIF Testing With Multiple Groups , 2013 .

[9]  Julia Eichmann Studies In Mathematical Psychology , 2016 .

[10]  Patrick Mair,et al.  A regularization approach for the detection of differential item functioning in generalized partial credit models , 2019, Behavior Research Methods.

[11]  S. Geer,et al.  On the conditions used to prove oracle results for the Lasso , 2009, 0910.0722.

[12]  Adam W. Meade,et al.  An Overview and Practical Guide to IRT Measurement Equivalence Analysis , 2015 .

[13]  David Thissen,et al.  Beyond group-mean differences: The concept of item bias. , 1986 .

[14]  Nambury S. Raju,et al.  Determining the Significance of Estimated Signed and Unsigned Areas Between Two Item Response Functions , 1990 .

[15]  Valószínűség és véletlen számítás,et al.  Bayesian Information Criterion , 2010 .

[16]  F. Vijver,et al.  Measures of Personality across Cultures , 2015 .

[17]  Wen-Chung Wang,et al.  The MIMIC Method With Scale Purification for Detecting Differential Item Functioning , 2009 .

[18]  Achim Zeileis,et al.  Anchor Selection Strategies for DIF Analysis , 2015, Educational and psychological measurement.

[19]  Po-Hsien Huang,et al.  A penalized likelihood method for multi-group structural equation modelling. , 2018, The British journal of mathematical and statistical psychology.

[20]  Melvin R. Novick,et al.  Some latent train models and their use in inferring an examinee's ability , 1966 .

[21]  N. Meinshausen,et al.  LASSO-TYPE RECOVERY OF SPARSE REPRESENTATIONS FOR HIGH-DIMENSIONAL DATA , 2008, 0806.0145.

[22]  Nambury S. Raju,et al.  The area between two item characteristic curves , 1988 .

[23]  Daniel J. Bauer,et al.  Simplifying the Assessment of Measurement Invariance over Multiple Background Variables: Using Regularized Moderated Nonlinear Factor Analysis to Detect Differential Item Functioning , 2019, Structural equation modeling : a multidisciplinary journal.

[24]  Georg Rasch,et al.  Probabilistic Models for Some Intelligence and Attainment Tests , 1981, The SAGE Encyclopedia of Research Design.

[25]  Gerhard Tutz,et al.  A Penalty Approach to Differential Item Functioning in Rasch Models , 2015, Psychometrika.

[26]  Allan S. Cohen,et al.  Detection of Differential Item Functioning in Multiple Groups. , 1995 .

[27]  Francis Tuerlinckx,et al.  Detection of Differential Item Functioning Using the Lasso Approach , 2015 .

[28]  Wen-Chung Wang,et al.  Effects of Anchor Item Methods on Differential Item Functioning Detection with the Likelihood Ratio Test , 2003 .

[29]  Howard Wainer,et al.  Detection of differential item functioning using the parameters of item response models. , 1993 .

[30]  Sergio Escorial,et al.  Analysis of the Gender Variable in the Eysenck Personality Questionnaire–Revised Scales Using Differential Item Functioning Techniques , 2007 .

[31]  R. Hambleton,et al.  The Effects of Purification of Matching Criterion on the Identification of DIF Using the Mantel-Haenszel Procedure , 1993 .

[32]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[33]  L. Tay,et al.  A Monte Carlo Study of an Iterative Wald Test Procedure for DIF Analysis , 2017, Educational and psychological measurement.

[34]  F. B. Gonçalves,et al.  An Integrated Bayesian Model for DIF Analysis , 2009 .

[35]  Wen-Chung Wang,et al.  Effects of Average Signed Area Between Two Item Characteristic Curves and Test Purification Procedures on the DIF Detection via the Mantel-Haenszel Method , 2004 .

[36]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[37]  G. J. Mellenbergh,et al.  Effects of Amount of DIF, Test Length, and Purification Type on Robustness and Power of Mantel-Haenszel Procedures , 2000 .

[38]  A. Zeileis,et al.  A Framework for Anchor Methods and an Iterative Forward Approach for DIF Detection , 2015, Applied psychological measurement.

[39]  Neil J. Dorans,et al.  Demonstrating the utility of the standardization approach to assessing unexpected differential item performance on the Scholastic Aptitude Test. , 1986 .

[40]  A. Zeileis,et al.  Rasch Mixture Models for DIF Detection , 2015, Educational and psychological measurement.

[41]  Bengt Muthén,et al.  A Method for Studying the Homogeneity of Test Items with Respect to Other Relevant Variables , 1985 .

[42]  William Stout,et al.  A model-based standardization approach that separates true bias/DIF from group ability differences and detects test bias/DTF as well as item bias/DIF , 1993 .

[43]  A. Goldberger STRUCTURAL EQUATION METHODS IN THE SOCIAL SCIENCES , 1972 .

[44]  T. Louis Finding the Observed Information Matrix When Using the EM Algorithm , 1982 .

[45]  Daniel J Bauer,et al.  Improving the assessment of measurement invariance: Using regularization to select anchor items and identify differential item functioning. , 2020, Psychological methods.

[46]  Dorothy T. Thayer,et al.  Application of an Empirical Bayes Enhancement of Mantel-Haenszel Differential Item Functioning Analysis to a Computerized Adaptive Test , 2002 .

[47]  A. Zellner Estimation of Regression Relationships Containing Unobservable Independent Variables , 1970 .

[48]  Bengt Muthen,et al.  Some uses of structural equation modeling in validity studies: Extending IRT to external variables , 1986 .

[49]  Rebecca Zwick,et al.  Using Loss Functions for DIF Detection: An Empirical Bayes Approach , 2000 .

[50]  W. Haenszel,et al.  Statistical aspects of the analysis of data from retrospective studies of disease. , 1959, Journal of the National Cancer Institute.

[51]  H. Swaminathan,et al.  Detecting Differential Item Functioning Using Logistic Regression Procedures , 1990 .

[52]  Victor Chernozhukov,et al.  Quantile regression , 2019, Journal of Econometrics.

[53]  Bengt Muthén,et al.  Multiple Group IRT Modeling: Applications to Item Bias Analysis , 1985 .

[54]  Frans J. Oort,et al.  Simulation study of item bias detection with restricted factor analysis , 1998 .

[55]  K. Yuan,et al.  Differential Item Functioning Analysis Without A Priori Information on Anchor Items: QQ Plots and Graphical Test , 2021, Psychometrika.