Failure of a numerical quality assessment scale to identify potential risk of bias in a systematic review: a comparison study

BackgroundAssessing methodological quality of primary studies is an essential component of systematic reviews. Following a systematic review which used a domain based system [United States Preventative Services Task Force (USPSTF)] to assess methodological quality, a commonly used numerical rating scale (Downs and Black) was also used to evaluate the included studies and comparisons were made between quality ratings assigned using the two different methods. Both tools were used to assess the 20 randomized and quasi-randomized controlled trials examining an exercise intervention for chronic musculoskeletal pain which were included in the review. Inter-rater reliability and levels of agreement were determined using intraclass correlation coefficients (ICC). Influence of quality on pooled effect size was examined by calculating the between group standardized mean difference (SMD).ResultsInter-rater reliability indicated at least substantial levels of agreement for the USPSTF system (ICC 0.85; 95% CI 0.66, 0.94) and Downs and Black scale (ICC 0.94; 95% CI 0.84, 0.97). Overall level of agreement between tools (ICC 0.80; 95% CI 0.57, 0.92) was also good. However, the USPSTF system identified a number of studies (n = 3/20) as “poor” due to potential risks of bias. Analysis revealed substantially greater pooled effect sizes in these studies (SMD −2.51; 95% CI −4.21, −0.82) compared to those rated as “fair” (SMD −0.45; 95% CI −0.65, −0.25) or “good” (SMD −0.38; 95% CI −0.69, −0.08).ConclusionsIn this example, use of a numerical rating scale failed to identify studies at increased risk of bias, and could have potentially led to imprecise estimates of treatment effect. Although based on a small number of included studies within an existing systematic review, we found the domain based system provided a more structured framework by which qualitative decisions concerning overall quality could be made, and was useful for detecting potential sources of bias in the available evidence.

[1]  Lisa Hartling,et al.  Testing the Newcastle Ottawa Scale showed low reliability between individual reviewers. , 2013, Journal of clinical epidemiology.

[2]  N. Black,et al.  The feasibility of creating a checklist for the assessment of the methodological quality both of randomised and non-randomised studies of health care interventions. , 1998, Journal of epidemiology and community health.

[3]  Elizabeth Gargon,et al.  Can a core outcome set improve the quality of systematic reviews? – a survey of the Co-ordinating Editors of Cochrane review groups , 2013, Trials.

[4]  Douglas G Altman,et al.  Empirical evidence of bias in treatment effect estimates in controlled trials with different interventions and outcomes: meta-epidemiological study , 2008, BMJ : British Medical Journal.

[5]  Lisa Hartling,et al.  Risk of bias versus quality assessment of randomised controlled trials: cross sectional study , 2009, BMJ : British Medical Journal.

[6]  Diana Petitti,et al.  Update on the Methods of the U.S. Preventive Services Task Force: Estimating Certainty and Magnitude of Net Benefit , 2007, Annals of Internal Medicine.

[7]  G. Guyatt,et al.  Systems for grading the quality of evidence and the strength of recommendations I: Critical appraisal of existing approaches The GRADE Working Group , 2004, BMC health services research.

[8]  Ethan M Balk,et al.  Correlation of quality measures with estimates of treatment effect in meta-analyses of randomized controlled trials. , 2002, JAMA.

[9]  Alessandro Liberati,et al.  Assessment of methodological quality of primary studies by systematic reviews: results of the metaquality cross sectional study , 2005, BMJ : British Medical Journal.

[10]  Douglas G Altman,et al.  The impact of outcome reporting bias in randomised controlled trials on a cohort of systematic reviews , 2010, BMJ : British Medical Journal.

[11]  F. Song,et al.  Evaluating non-randomised intervention studies. , 2003, Health technology assessment.

[12]  C. Mulrow,et al.  Current methods of the US Preventive Services Task Force: a review of the process. , 2001, American journal of preventive medicine.

[13]  D. Moher,et al.  Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. , 2010, International journal of surgery.

[14]  R. Teasell,et al.  A comparison of the PEDro and Downs and Black quality assessment tools using the acquired brain injury intervention literature. , 2013, NeuroRehabilitation (Reading, MA).

[15]  J. Higgins,et al.  Cochrane Handbook for Systematic Reviews of Interventions, Version 5.1.0. The Cochrane Collaboration , 2013 .

[16]  Michele Tarsilla Cochrane Handbook for Systematic Reviews of Interventions , 2010, Journal of MultiDisciplinary Evaluation.

[17]  G. Cummings,et al.  Assessment of study quality for systematic reviews: a comparison of the Cochrane Collaboration Risk of Bias Tool and the Effective Public Health Practice Project Quality Assessment Tool: methodological research. , 2012, Journal of evaluation in clinical practice.

[18]  S. Greenland Quality Scores Are Useless and Potentially Misleading: Reply to “Re: A Critical Look at Some Popular Analytic Methods” , 1994 .

[19]  Patrick J. Kellam,et al.  Reporting and methodological quality of systematic reviews in the orthopaedic literature. , 2013, The Journal of bone and joint surgery. American volume.

[20]  M. Tully,et al.  Walking exercise for chronic musculoskeletal pain: systematic review and meta-analysis. , 2015, Archives of physical medicine and rehabilitation.

[21]  M. Oremus,et al.  Inter-rater and test–retest reliability of quality assessments by novice student raters using the Jadad and Newcastle–Ottawa Scales , 2012, BMJ Open.

[22]  Ethan M Balk,et al.  Influence of Reported Study Design Characteristics on Intervention Effect Estimates From Randomized, Controlled Trials , 2012, Annals of Internal Medicine.

[23]  A D Oxman,et al.  Randomisation to protect against selection bias in healthcare trials. , 2011, The Cochrane database of systematic reviews.

[24]  Ian McDowell,et al.  The Theoretical and Technical Foundations of Health Measurement , 1996 .

[25]  G. Guyatt,et al.  GRADE guidelines: 4. Rating the quality of evidence--study limitations (risk of bias). , 2011, Journal of clinical epidemiology.

[26]  J. Hilden,et al.  Multivariable modelling for meta‐epidemiological assessment of the association between trial quality and treatment effects estimated in randomized clinical trials , 2007, Statistics in medicine.

[27]  Peter Herbison,et al.  Adjustment of meta-analyses on the basis of quality scores should be abandoned. , 2006, Journal of clinical epidemiology.

[28]  R. J. Hayes,et al.  Empirical evidence of bias. Dimensions of methodological quality associated with estimates of treatment effects in controlled trials. , 1995, JAMA.

[29]  Michel Revel,et al.  Impact of quality scales on levels of evidence inferred from a systematic review of exercise therapy and low back pain. , 2002, Archives of physical medicine and rehabilitation.

[30]  D. Morrissey,et al.  Risk factors and successful interventions for cricket-related low back pain: a systematic review , 2013, British Journal of Sports Medicine.

[31]  C. Emery,et al.  Are joint injury, sport activity, physical activity, obesity, or occupational activities predictors for osteoarthritis? A systematic review. , 2013, The Journal of orthopaedic and sports physical therapy.

[32]  D. Moher,et al.  Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. , 2010, International journal of surgery.

[33]  Michael A Hunt,et al.  Gait modification strategies for altering medial knee joint load: A systematic review , 2011, Arthritis care & research.