Item Response Theory Models for Wording Effects in Mixed-Format Scales

Many scales contain both positively and negatively worded items. Reverse recoding of negatively worded items might not be enough for them to function as positively worded items do. In this study, we commented on the drawbacks of existing approaches to wording effect in mixed-format scales and used bi-factor item response theory (IRT) models to test the assumption of reverse coding and evaluate the magnitude of the wording effect. The parameters of the bi-factor IRT models can be estimated with existing computer programs. Two empirical examples from the Program for International Student Assessment and the Trends in International Mathematics and Science Study were given to demonstrate the advantages of the bi-factor approach over traditional ones. It was found that the wording effect in these two data sets was substantial and that ignoring the wording effect resulted in overestimated test reliability and biased person measures.

[1]  E. Muraki A GENERALIZED PARTIAL CREDIT MODEL: APPLICATION OF AN EM ALGORITHM , 1992 .

[2]  Melvin R. Novick,et al.  Some latent train models and their use in inferring an examinee's ability , 1966 .

[3]  A. Chessa,et al.  Answering attitudinal questions : Modelling the response process underlying contrastive questions , 2007 .

[4]  Bregje Holleman,et al.  Agree or Disagree? Cognitive Processes in Answering Contrastive Survey Questions , 2011 .

[5]  Herbert H. Clark,et al.  Semantics and comprehension , 1976 .

[6]  B. Thompson,et al.  Research news and Comment: A National Survey of AERA Members’ Perceptions of Statistical Significance Tests and Other Statistical Issues , 2000 .

[7]  Irvine Clarke,et al.  Extreme response style in cross-cultural research: An empirical investigation. , 2000 .

[8]  Christine DiStefano,et al.  Wording Effects in Self-Esteem Scales: Methodological Artifact or Response Style? , 2003 .

[9]  Carol M. Woods Careless Responding to Reverse-Worded Items: Implications for Confirmatory Factor Analysis , 2006 .

[10]  Chester A. Schriesheim,et al.  Controlling Acquiescence Response Bias by Item Reversals: The Effect on Questionnaire Validity , 1981 .

[11]  J. Jackson Barnette,et al.  Effects of Stem and Likert Response Option Reversals on Survey Internal Consistency: If You Feel the Need, There is a Better Alternative to Using those Negatively Worded Stems , 2000 .

[12]  M. Rosenberg Society and the adolescent self-image , 1966 .

[13]  G. Masters A rasch model for partial credit scoring , 1982 .

[14]  Wen-Chung Wang,et al.  Improving measurement precision of test batteries using multidimensional item response models. , 2004, Psychological methods.

[15]  Michael J. Roszkowski,et al.  Shifting gears: consequences of including two negatively worded items in the middle of a positively worded questionnaire , 2010 .

[16]  Yeh-Tai Chou,et al.  Checking Dimensionality in Item Response Models With Principal Component Analysis on Standardized Residuals , 2010 .

[17]  R. Motl,et al.  Validity and factorial invariance of the Social Physique Anxiety Scale. , 2000, Medicine and science in sports and exercise.

[18]  J. N. Bassili,et al.  RESPONSE LATENCY AS A SIGNAL TO QUESTION PROBLEMS IN SURVEY RESEARCH , 1996 .

[19]  J. Barnette Nonattending Respondent Effects on Internal Consistency of Self-Administered Surveys: A Monte Carlo Simulation Study , 1999 .

[20]  Wen-Chung Wang,et al.  Multidimensional Rasch Analysis of a Psychological Test With Multiple Subtests , 2009 .

[21]  David J. Lunn,et al.  The BUGS Book: A Practical Introduction to Bayesian Analysis , 2013 .

[22]  Edward G. Carmines,et al.  Reliability and Validity Assessment , 1979 .

[23]  S. Kellett,et al.  Utility of the Rosenberg self-esteem scale. , 2009, American journal on intellectual and developmental disabilities.

[24]  Steven P. Reise,et al.  The role of the bifactor model in resolving dimensionality issues in health outcomes measures , 2007, Quality of Life Research.

[25]  Wendy M. Yen,et al.  Effects of Local Item Dependence on the Fit and Equating Performance of the Three-Parameter Logistic Model , 1984 .

[26]  James E. Burroughs,et al.  Do Reverse-Worded Items Confound Measures in Cross-Cultural Consumer Research? The Case of the Material Values Scale , 2003 .

[27]  L. Cronbach Response Sets and Test Validity , 1946 .

[28]  Gordon W. Cheung,et al.  Assessing Extreme and Acquiescence Response Sets in Cross-Cultural Research Using Structural Equations Modeling , 2000 .

[29]  T. Lin Identifying Optimal Items in Quality of Life Assessment , 2007 .

[30]  Sophia Rabe-Hesketh,et al.  Multilevel and Longitudinal Modeling Using Stata , 2005 .

[31]  Georg Rasch,et al.  Probabilistic Models for Some Intelligence and Attainment Tests , 1981, The SAGE Encyclopedia of Research Design.

[32]  Bradley P. Carlin,et al.  Bayesian measures of model complexity and fit , 2002 .

[33]  Bregje Holleman,et al.  Wording Effects in Survey Research: Using Meta-Analysis to Explain the Forbid/Allow Asymmetry , 1999, J. Quant. Linguistics.

[34]  F. Lucidi,et al.  Method Effects: The Problem With Negatively Versus Positively Keyed Items , 2012, Journal of personality assessment.

[35]  L. Cronbach Further Evidence on Response Sets and Test Design , 1950 .

[36]  Anthony J. Onwuegbuzie,et al.  Characteristics of respondents who respond differently to positively and negatively worded items on rating scales , 2003 .

[37]  J. Ory Item Placement and Wording Effects on Overall Ratings , 1982 .

[38]  V. Leirer,et al.  Development and validation of a geriatric depression screening scale: a preliminary report. , 1982, Journal of psychiatric research.

[39]  David E. Kanouse,et al.  Controlling for Acquiescence Response Set in Scale Development , 1982 .

[40]  Christine DiStefano,et al.  Further Investigating Method Effects Associated With Negatively Worded Items on Self-Report Surveys , 2006 .

[41]  Raymond J. Adams,et al.  Multilevel Item Response Models: An Approach to Errors in Variables Regression , 1997 .

[42]  H W Marsh,et al.  Positive and negative global self-esteem: a substantively meaningful distinction or artifactors? , 1996, Journal of personality and social psychology.

[43]  Robert F. DeVellis,et al.  Scale Development: Theory and Applications. , 1992 .

[44]  Bruce Thompson,et al.  A National Survey of AERA Members' Perceptions of Statistical Significance Tests and Other Statistical Issues. , 2000 .

[45]  Dennis N. Bristow,et al.  Did You Not Understand The Question Or Not? An Investigation Of Negatively Worded Questions In Survey Research , 2011 .

[46]  Theodore J. Christ,et al.  The Impact of Item Wording and Behavioral Specificity on the Accuracy of Direct Behavior Ratings (DBRs) , 2009 .

[47]  S. Reise,et al.  The Importance of Modeling Method Effects: Resolving the (Uni)Dimensionality of the Loneliness Questionnaire , 2012, Journal of personality assessment.

[48]  G. Moors,et al.  Response style behavior: question format dependent or personal style? , 2011, Quality & Quantity.

[49]  D. J. Lee Society and the Adolescent Self-Image , 1969 .

[50]  M. Zuckerman,et al.  Hypothesis confirmation: The joint effect of positive test strategy and acquiescence response set. , 1995 .