Chapter 8: Meta-analysis of Test Performance When There is a “Gold Standard”

Synthesizing information on test performance metrics such as sensitivity, specificity, predictive values and likelihood ratios is often an important part of a systematic review of a medical test. Because many metrics of test performance are of interest, the meta-analysis of medical tests is more complex than the meta-analysis of interventions or associations. Sometimes, a helpful way to summarize medical test studies is to provide a “summary point”, a summary sensitivity and a summary specificity. Other times, when the sensitivity or specificity estimates vary widely or when the test threshold varies, it is more helpful to synthesize data using a “summary line” that describes how the average sensitivity changes with the average specificity. Choosing the most helpful summary is subjective, and in some cases both summaries provide meaningful and complementary information. Because sensitivity and specificity are not independent across studies, the meta-analysis of medical tests is fundamentaly a multivariate problem, and should be addressed with multivariate methods. More complex analyses are needed if studies report results at multiple thresholds for positive tests. At the same time, quantitative analyses are used to explore and explain any observed dissimilarity (heterogeneity) in the results of the examined studies. This can be performed in the context of proper (multivariate) meta-regressions.

[1]  J. Stockman Systematic Review of Screening for Bilirubin Encephalopathy in Neonates , 2011 .

[2]  Roger M Harbord,et al.  A unification of models for meta-analysis of diagnostic accuracy studies. , 2007, Biostatistics.

[3]  Patrick M Bossuyt,et al.  We should not pool diagnostic likelihood ratios in systematic reviews , 2008, Statistics in medicine.

[4]  L E Moses,et al.  Estimating Diagnostic Accuracy from Multiple Conflicting Reports , 1993, Medical decision making : an international journal of the Society for Medical Decision Making.

[5]  Kevin M. Small,et al.  Evaluating Practices and Developing Tools for Comparative Effectiveness Reviews of Diagnostic Test Accuracy , 2013 .

[6]  Crystal M. Riley,et al.  Chapter 5: Assessing Risk of Bias as a Domain of Quality in Medical Test Studies , 2012, Journal of General Internal Medicine.

[7]  Theo Stijnen,et al.  The binomial distribution of meta-analysis was preferred to model within-study variability. , 2008, Journal of clinical epidemiology.

[8]  Christopher H Schmid,et al.  Summing up evidence: one answer is not always enough , 1998, The Lancet.

[9]  N. Terrin,et al.  Evaluation of technologies for identifying acute cardiac ischemia in emergency departments. , 2000, Evidence report/technology assessment.

[10]  Johannes B Reitsma,et al.  Evidence of bias and variation in diagnostic accuracy studies , 2006, Canadian Medical Association Journal.

[11]  Richard D Riley,et al.  Beyond the Bench: Hunting Down Fugitive Literature , 2004, Environmental Health Perspectives.

[12]  P. Bossuyt,et al.  Sources of Variation and Bias in Studies of Diagnostic Accuracy , 2004, Annals of Internal Medicine.

[13]  David B. Matchar,et al.  Options for Summarizing Medical Test Performance in the Absence of a “Gold Standard” -- Methods Guide for Medical Test Reviews , 2012 .

[14]  C Gatsonis,et al.  Meta‐analysis of Diagnostic Test Accuracy Assessment Studies with Varying Number of Thresholds , 2003, Biometrics.

[15]  Frederick Mosteller,et al.  Guidelines for Meta-analyses Evaluating Diagnostic Tests , 1994, Annals of Internal Medicine.

[16]  M. Hunink,et al.  Peripheral arterial disease: gadolinium-enhanced MR angiography versus color-guided duplex US--a meta-analysis. , 2000, Radiology.

[17]  Patrick M M Bossuyt,et al.  Differences between univariate and bivariate models for summarizing diagnostic accuracy may not be large. , 2009, Journal of clinical epidemiology.

[18]  Johannes B Reitsma,et al.  Meta-Analysis of Diagnostic Studies: A Comparison of Random Intercept, Normal-Normal, and Binomial-Normal Bivariate Summary ROC Approaches , 2008, Medical decision making : an international journal of the Society for Medical Decision Making.

[19]  C A Gatsonis,et al.  Regression methods for meta-analysis of diagnostic test data. , 1995, Academic radiology.

[20]  Patrick M M Bossuyt,et al.  Exploring sources of heterogeneity in systematic reviews of diagnostic tests , 2002, Statistics in medicine.

[21]  David B. Matchar,et al.  Chapter 1: Introduction to the Methods Guide for Medical Test Reviews , 2012, Journal of general internal medicine.

[22]  F. Buntinx,et al.  Meta-analysis of ROC Curves , 2000, Medical decision making : an international journal of the Society for Medical Decision Making.

[23]  P C Lambert,et al.  An evaluation of bivariate random‐effects meta‐analysis for the joint synthesis of two correlated outcomes , 2007, Statistics in medicine.

[24]  Jeroen G Lijmer,et al.  Proposals for a Phased Evaluation of Medical Tests , 2009, Medical decision making : an international journal of the Society for Medical Decision Making.

[25]  David Moher,et al.  The STARD Statement for Reporting Studies of Diagnostic Accuracy: Explanation and Elaboration , 2003, Annals of Internal Medicine [serial online].

[26]  Link,et al.  Diagnostic accuracy may vary with prevalence: Implications for evidence-based diagnosis , 2008 .

[27]  Stephanie A. Mulherin,et al.  Spectrum Bias or Spectrum Effect? Subgroup Variation in Diagnostic Test Evaluation , 2002, Annals of Internal Medicine.

[28]  J. Ioannidis,et al.  Accuracy of biomarkers to diagnose acute cardiac ischemia in the emergency department: a meta-analysis. , 2001, Annals of emergency medicine.

[29]  Patrick M M Bossuyt,et al.  Diagnostic test accuracy may vary with prevalence: implications for evidence-based diagnosis. , 2009, Journal of clinical epidemiology.

[30]  T. Trikalinos,et al.  Systematic Review of Screening for Bilirubin Encephalopathy in Neonates , 2009, Pediatrics.

[31]  Thomas A. Trikalinos,et al.  Chapter 9: Options for Summarizing Medical Test Performance in the Absence of a “Gold Standard” , 2012, Journal of General Internal Medicine.

[32]  William F. Lawrence,et al.  Chapter 10: Deciding Whether to Complement a Systematic Review of Medical Tests with Decision Modeling , 2012, Journal of General Internal Medicine.

[33]  D. Altman,et al.  Diagnostic tests 4: likelihood ratios , 2004, BMJ : British Medical Journal.

[34]  G M Raab,et al.  When are summary ROC curves appropriate for diagnostic meta‐analyses? , 2009, Statistics in medicine.

[35]  Tatyana Shamliyan,et al.  Conducting quantitative synthesis when comparing medical interventions: AHRQ and the Effective Health Care Program. , 2011, Journal of clinical epidemiology.

[36]  Patrick M M Bossuyt,et al.  Using the Principles of Randomized Controlled Trial Design to Guide Test Evaluation , 2009, Medical decision making : an international journal of the Society for Medical Decision Making.

[37]  Tze-Wey Loong,et al.  Understanding sensitivity and specificity with the right side of the brain , 2003, BMJ : British Medical Journal.

[38]  J. Kardaun,et al.  Comparative Diagnostic Performance of Three Radiological Procedures for the Detection of Lumbar Disk Herniation , 1990, Methods of Information in Medicine.

[39]  D. Matchar,et al.  Chapter 6: Assessing Applicability of Medical Test Studies in Systematic Reviews , 2012, Journal of General Internal Medicine.

[40]  M G Myriam Hunink,et al.  MR imaging of the menisci and cruciate ligaments: a systematic review. , 2003, Radiology.

[41]  David J Samson,et al.  Challenges in Systematic Reviews of Diagnostic Technologies , 2005, Annals of Internal Medicine.

[42]  L E Moses,et al.  Combining independent studies of a diagnostic test into a summary ROC curve: data-analytic approaches and some additional considerations. , 1993, Statistics in medicine.

[43]  J. C. Houwelingen,et al.  Bivariate Random Effects Meta-Analysis of ROC Curves , 2008, Medical decision making : an international journal of the Society for Medical Decision Making.

[44]  S. Sharp,et al.  Explaining heterogeneity in meta-analysis: a comparison of methods. , 1997, Statistics in medicine.

[45]  J. Philbrick,et al.  D-dimer testing and acute venous thromboembolism. A shortcut to accurate diagnosis? , 1996, Archives of internal medicine.

[46]  P. Bossuyt,et al.  The diagnostic odds ratio: a single indicator of test performance. , 2003, Journal of clinical epidemiology.

[47]  Haitao Chu,et al.  Bivariate meta-analysis of sensitivity and specificity with sparse data: a generalized linear mixed model approach. , 2006, Journal of clinical epidemiology.

[48]  Lucas M. Bachmann,et al.  An empirical comparison of methods for meta-analysis of diagnostic accuracy showed hierarchical models are necessary. , 2008, Journal of clinical epidemiology.

[49]  P. Bossuyt,et al.  Empirical evidence of design-related bias in studies of diagnostic tests. , 1999, JAMA.

[50]  Johannes B Reitsma,et al.  Bivariate analysis of sensitivity and specificity produces informative summary measures in diagnostic reviews. , 2005, Journal of clinical epidemiology.