Measurement and control of bias in patient reported outcomes using multidimensional item response theory

BackgroundPatient-reported outcome (PRO) measures play a key role in the advancement of patient-centered care research. The accuracy of inferences, relevance of predictions, and the true nature of the associations made with PRO data depend on the validity of these measures. Errors inherent to self-report measures can seriously bias the estimation of constructs assessed by the scale. A well-documented disadvantage of self-report measures is their sensitivity to response style (RS) effects such as the respondent’s tendency to select the extremes of a rating scale. Although the biasing effect of extreme responding on constructs measured by self-reported tools has been widely acknowledged and studied across disciplines, little attention has been given to the development and systematic application of methodologies to assess and control for this effect in PRO measures.MethodsWe review the methodological approaches that have been proposed to study extreme RS effects (ERS). We applied a multidimensional item response theory model to simultaneously estimate and correct for the impact of ERS on trait estimation in a PRO instrument. Model estimates were used to study the biasing effects of ERS on sum scores for individuals with the same amount of the targeted trait but different levels of ERS. We evaluated the effect of joint estimation of multiple scales and ERS on trait estimates and demonstrated the biasing effects of ERS on these trait estimates when used as explanatory variables.ResultsA four-dimensional model accounting for ERS bias provided a better fit to the response data. Increasing levels of ERS showed bias in total scores as a function of trait estimates. The effect of ERS was greater when the pattern of extreme responding was the same across multiple scales modeled jointly. The estimated item category intercepts provided evidence of content independent category selection. Uncorrected trait estimates used as explanatory variables in prediction models showed downward bias.ConclusionsA comprehensive evaluation of the psychometric quality and soundness of PRO assessment measures should incorporate the study of ERS as a potential nuisance dimension affecting the accuracy and validity of scores and the impact of PRO data in clinical research and decision making.

[1]  L. McLeod,et al.  ISOQOL recommends minimum standards for patient-reported outcome measures used in patient-centered outcomes and comparative effectiveness research , 2013, Quality of Life Research.

[2]  R. Kay The Analysis of Survival Data , 2012 .

[3]  E. Basch,et al.  Standards for patient-reported outcome-based performance measures. , 2013, JAMA.

[4]  Daniel M. Bolt,et al.  Multiscale Measurement of Extreme Response Style , 2011 .

[5]  Hans Baumgartner,et al.  Response Styles in Marketing Research: A Cross-National Investigation , 2001 .

[6]  J. R. Scotti,et al.  Available From , 1973 .

[7]  H. Akaike A new look at the statistical model identification , 1974 .

[8]  M. Stein,et al.  Peptic Ulcer Disease and Neuroticism in the United States Adult Population , 2002, Psychotherapy and Psychosomatics.

[9]  U. Böckenholt Modeling multiple response processes in judgment and choice. , 2012, Psychological methods.

[10]  J F Fries,et al.  The promise of PROMIS: using item response theory to improve assessment of patient-reported outcomes. , 2005, Clinical and experimental rheumatology.

[11]  David Moher,et al.  Reporting of patient-reported outcomes in randomized trials: the CONSORT PRO extension. , 2013, JAMA.

[12]  I. Zettler,et al.  The Stability of Extreme Response Style and Acquiescence Over 8 Years , 2016, Assessment.

[13]  C. Montag,et al.  The serotonin transporter polymorphism (5-HTTLPR) and personality: response style as a new endophenotype for anxiety. , 2014, The international journal of neuropsychopharmacology.

[14]  Karen N. Allen,et al.  A systematic review of generic multidimensional patient-reported outcome measures for children, part II: evaluation of psychometric performance of English-language versions in a general population. , 2015, Value in health : the journal of the International Society for Pharmacoeconomics and Outcomes Research.

[15]  G. Meisenberg,et al.  Are acquiescent and extreme response styles related to low intelligence and education , 2008 .

[16]  Patrick J.F. Groenen,et al.  Identifying Response Styles: A Latent-Class Bilinear Multinomial Logit Model , 2010 .

[17]  P. O'Malley,et al.  Response Styles Revisited : Racial / Ethnic and Gender Differences in Extreme Responding Monitoring the Future Occasional Paper , 2010 .

[18]  Jürgen Rost,et al.  Rasch Models in Latent Classes: An Integration of Two Approaches to Item Analysis , 1990 .

[19]  Ute R. Hülsheger,et al.  Dissociating Indifferent, Directional, and Extreme Responding in Personality Data: Applying the Three-Process Model to Self- and Observer Reports. , 2016, Journal of personality.

[20]  E. Greenleaf,et al.  MEASURING EXTREME RESPONSE STYLE , 1992 .

[21]  J. Schneider,et al.  Overview and findings from the religious orders study. , 2012, Current Alzheimer research.

[22]  Susan M. Resnick,et al.  Personality and risk of Alzheimer's disease: New data and meta-analysis , 2014, Alzheimer's & Dementia.

[23]  Yuelin Li,et al.  Using R and WinBUGS to fit a generalized partial credit model for developing and evaluating patient‐reported outcomes assessments , 2012, Statistics in medicine.

[24]  G. Moors,et al.  Exploring the effect of a middle response category on response style in attitude measurement , 2007, Quality & quantity.

[25]  Z. Segal,et al.  Extreme response style in recurrent and chronically depressed patients: change with antidepressant administration and stability during continuation treatment. , 2007, Journal of consulting and clinical psychology.

[26]  S. Heine,et al.  Cultural differences in response styles: The role of dialectical thinking , 2008 .

[27]  D. Hamilton,et al.  Personality attributes associated with extreme response style. , 1968, Psychological bulletin.

[28]  L. Cronbach Further Evidence on Response Sets and Test Design , 1950 .

[29]  C. Forrest,et al.  The case for an international patient-reported outcomes measurement information system (PROMIS®) initiative , 2013, Health and Quality of Life Outcomes.

[30]  Brooke E. Magnus,et al.  PROMIS® Parent Proxy Report Scales for children ages 5–7 years: an item response theory analysis of differential item functioning across age groups , 2014, Quality of Life Research.

[31]  E. Muraki A Generalized Partial Credit Model: Application of an EM Algorithm , 1992 .

[32]  T. Croudace,et al.  Factors of psychological distress: clinical value, measurement substance, and methodological artefacts , 2015, Social Psychiatry and Psychiatric Epidemiology.

[33]  G. Moors Facts and Artefacts in the Comparison of Attitudes Among Ethnic Minorities. A Multigroup Latent Class Structure Model with Adjustment for Response Style Behavior , 2004 .

[34]  Young Ik Cho,et al.  The Relation Between Culture and Response Styles , 2005 .

[35]  Irvine Clarke,et al.  Extreme response style in cross‐cultural research , 2001 .

[36]  T. Reinehr,et al.  Changes in self-reported and parent-reported health-related quality of life in overweight children and adolescents participating in an outpatient training: findings from a 12-month follow-up study , 2013, Health and Quality of Life Outcomes.

[37]  L. Cronbach Response Sets and Test Validity , 1946 .

[38]  Jee-Seon Kim,et al.  Measurement and control of response styles using anchoring vignettes: a model-based approach. , 2014, Psychological methods.

[39]  P. Costa,et al.  Revised NEO Personality Inventory (NEO-PI-R) and NEO-Five-Factor Inventory (NEO-FFI) , 1992 .

[40]  D. Rubin,et al.  Statistical Analysis with Missing Data. , 1989 .

[41]  J. Rost,et al.  Applications of Latent Trait and Latent Class Models in the Social Sciences , 1998 .

[42]  Anne Thissen-Roe,et al.  A Two-Decision Model for Responses to Likert-Type Items , 2013 .

[43]  Troy Devon Thomas,et al.  Response Styles in Survey Research: A Literature Review of Antecedents, Consequences, and Remedies , 2013 .

[44]  E. Greenleaf Improving Rating Scale Measures by Detecting and Correcting Bias Components in Some Response Styles , 1992 .

[45]  V. Sébille,et al.  Analysis of longitudinal Patient-Reported Outcomes with informative and non-informative dropout: Comparison of CTT and Rasch-based methods , 2011 .

[46]  L. Hegedüs,et al.  Cross-cultural validity of the thyroid-specific quality-of-life patient-reported outcome measure, ThyPRO , 2015, Quality of Life Research.

[47]  J. Dignam,et al.  Choice and interpretation of statistical tests used when competing risks are present. , 2008, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[48]  Jeroen K. Vermunt,et al.  Dealing with Extreme Response Style in Cross-Cultural Research: A Restricted Latent Class Factor Analysis Approach , 2011 .

[49]  K. Kendler,et al.  A Swedish Longitudinal, Population-Based Twin Study , 2006 .

[50]  G. Marín,et al.  Extreme Response Style and Acquiescence among Hispanics , 1992 .

[51]  Ethan Basch,et al.  The missing voice of patients in drug-safety reporting. , 2010, The New England journal of medicine.

[52]  Daniel M. Bolt,et al.  On the Use of Factor-Analytic Multinomial Logit Item Response Models to Account for Individual Differences in Response Style , 2010 .

[53]  V. Sébille,et al.  Comparison of CTT and Rasch‐based approaches for the analysis of longitudinal Patient Reported Outcomes , 2011, Statistics in medicine.

[54]  P. Areán,et al.  Differential item functioning in a Spanish translation of the Beck Depression Inventory. , 2001, Journal of clinical psychology.

[55]  J. E. Kurtz,et al.  Internal Consistency, Retest Reliability, and Their Implications for Personality Scale Validity , 2011, Personality and social psychology review : an official journal of the Society for Personality and Social Psychology, Inc.

[56]  J. Fleishman,et al.  Assessing and understanding measurement equivalence in health outcome measures. Issues for further quantitative and qualitative inquiry. , 2006, Medical care.

[57]  Melissa S. Yale,et al.  Differential Item Functioning , 2014 .

[58]  G. Moors,et al.  Variations in response style behavior by response scale format in attitude research , 2010 .

[59]  Jay Magidson,et al.  Technical Guide for Latent GOLD 5.1: Basic, Advanced, and Syntax 1 , 2016 .

[60]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[61]  Jeremy C Hobart,et al.  Rating scales as outcome measures for clinical trials in neurology: problems, solutions, and recommendations , 2007, The Lancet Neurology.

[62]  Robert Weech-Maldonado,et al.  Survey Response Style and Differential Use of CAHPS Rating Scales by Hispanics , 2008, Medical care.

[63]  Anne-Wil Harzing Response Styles in Cross-national Survey Research , 2006 .

[64]  Daniel M. Bolt,et al.  Addressing Score Bias and Differential Item Functioning Due to Individual Differences in Response Style , 2009 .

[65]  Nancy L Pedersen,et al.  Personality and major depression: a Swedish longitudinal, population-based twin study. , 2006, Archives of general psychiatry.

[66]  Achim Zeileis,et al.  Rasch Trees: A New Method for Detecting Differential Item Functioning in the Rasch Model , 2015, Psychometrika.

[67]  Li Cai,et al.  A flexible full-information approach to the modeling of response styles. , 2015, Psychological methods.

[68]  D. Bolt,et al.  Examining the attitude-achievement paradox in PISA using a multilevel multidimensional IRT model for extreme response style , 2015, Large-scale Assessments in Education.

[69]  J. Teresi,et al.  Occurrences and sources of Differential Item Functioning (DIF) in patient-reported outcome measures: Description of DIF methods, and review of measures of depression, quality of life and general health. , 2008, Psychology science quarterly.

[70]  Claus H. Carstensen,et al.  Do Individual Response Styles Matter?: Assessing Differential Item Functioning for Men and Women in the NEO-PI-R , 2013 .

[71]  D. Paulhus Measurement and control of response bias. , 1991 .

[72]  J. Schneider,et al.  Individual differences in rates of change in cognitive abilities of older persons. , 2002, Psychology and aging.

[73]  S. Reise,et al.  Analysis of differential item functioning in the depression item bank from the Patient Reported Outcome Measurement Information System (PROMIS): An item response theory approach. , 2009, Psychology science quarterly.

[74]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[75]  M. Stopsack,et al.  Neuroticism developmental courses - implications for depression, anxiety and everyday emotional experience; a prospective study from adolescence to young adulthood , 2014, BMC Psychiatry.

[76]  Jean-Paul Fox,et al.  Using Item Response Theory to Measure Extreme Response Style in Marketing Research: A Global Investigation , 2008 .

[77]  M. Elliott,et al.  Adjusting for subgroup differences in extreme response tendency in ratings of health care: impact on disparity estimates. , 2009, Health services research.

[78]  Véronique Sébille,et al.  Rasch-family models are more valuable than score-based approaches for analysing longitudinal patient-reported outcomes with missing data , 2016, Statistical methods in medical research.

[79]  G. Moors Diagnosing Response Style Behavior by Means of a Latent-Class Factor Approach. Socio-Demographic Correlates of Gender Role Attitudes and Perceptions of Ethnic Discrimination Reexamined , 2003 .

[80]  R. R. Abidin Psychological Assessment Resources , 1995 .