The reproducibility of a method to identify the overuse and underuse of medical procedures.

BACKGROUND To assess the overuse and underuse of medical procedures, various methods have been developed, but their reproducibility has not been evaluated. This study estimates the reproducibility of one commonly used method. METHODS We performed a parallel, three-way replication of the RAND-University of California at Los Angeles appropriateness method as applied to two medical procedures, coronary revascularization and hysterectomy. Three nine-member multidisciplinary panels of experts were composed for each procedure by stratified random sampling from a list of experts nominated by the relevant specialty societies. Each panel independently rated the same set of clinical scenarios in terms of the appropriateness of the relevant procedure on a risk-benefit scale ranging from 1 to 9. Final ratings were used to classify the procedure in each scenario as necessary or not necessary (to evaluate underuse) and inappropriate or not inappropriate (to evaluate overuse). Reproducibility was measured by overall agreement and by the kappa statistic. The criteria for underuse and overuse derived from these ratings were then applied to real populations of patients who had undergone coronary revascularization or hysterectomy. RESULTS The rates of agreement among the three coronary-revascularization panels were 95, 94, and 96 percent for inappropriate-use scenarios and 93, 92, and 92 percent for necessary-use scenarios. Agreement among the three hysterectomy panels was 88, 70, and 74 percent for inappropriate-use scenarios. Scenarios involving necessary use of hysterectomy were not assessed. The three-way kappa statistic to detect overuse was 0.52 for coronary revascularization and 0.51 for hysterectomy. The three-way kappa statistic to detect underuse of coronary revascularization was 0.83. Application of individual panels' criteria to real populations of patients resulted in a 100 percent variation in the proportion of cases classified as inappropriate and a 20 percent variation in the proportion of cases classified as necessary. CONCLUSIONS The appropriateness method is far from perfect. Appropriateness criteria may be useful in comparing levels of appropriate procedures among populations but should not by themselves be used to direct care for individual patients.

[1]  T C Chalmers,et al.  Meta-analysis of clinical trials as a scientific discipline. II: Replicate variability and comparison of studies that agree and disagree. , 1987, Statistics in medicine.

[2]  K. Carlson,et al.  Indications for hysterectomy. , 1993, The New England journal of medicine.

[3]  R. Newcombe,et al.  Observer variation in histopathological diagnosis and grading of cervical intraepithelial neoplasia. , 1989, BMJ.

[4]  J. Elmore,et al.  Variability in radiologists' interpretations of mammograms. , 1994, The New England journal of medicine.

[5]  R. Brook,et al.  Sufficiency of clinical literature on the appropriate uses of six medical and surgical procedures. , 1987, The Western journal of medicine.

[6]  E. McGlynn,et al.  The Appropriateness of Hysterectomy: A Comparison of Care in Seven Health Plans , 1993 .

[7]  R. Brook,et al.  Audit of coronary angiography and bypass surgery , 1990, The Lancet.

[8]  A. Bengtson,et al.  The appropriateness of performing coronary angiography and coronary artery revascularization in a Swedish population. , 1994, JAMA.

[9]  L. Leape,et al.  Measuring the Necessity of Medical Procedures , 1994, Medical care.

[10]  R H Brook,et al.  A Method for the Detailed Assessment of the Appropriateness of Medical Technologies , 1986, International Journal of Technology Assessment in Health Care.

[11]  J. Fleiss,et al.  Factors affecting uniformity in interpretation of planar thallium-201 imaging in a multicenter trial. The Multicenter Study on Silent Myocardial Ischemia (MSSMI) Thallium-201 Investigators. , 1993, Journal of the American College of Cardiology.

[12]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[13]  E. Keeler,et al.  RAND Corporation From the SelectedWorks of Emmett Keeler March , 1995 The Cost Effectiveness of Preoperative Autologous Blood Donations , 2016 .

[14]  E. McGlynn,et al.  The appropriateness of hysterectomy. A comparison of care in seven health plans. Health Maintenance Organization Quality of Care Consortium. , 1993, JAMA.

[15]  L. Leape,et al.  The appropriateness of use of coronary artery bypass graft surgery in New York State. , 1993, JAMA.

[16]  C H Schmid,et al.  Large trials vs meta-analysis of smaller trials : How do their results compare ? , 1996 .

[17]  G. Grégoire,et al.  Discrepancies between meta-analyses and subsequent large randomized, controlled trials. , 1997, The New England journal of medicine.

[18]  J. Birkmeyer,et al.  Efficacy and cost-effectiveness of autologous blood predeposit in patients undergoing radical prostatectomy procedures. , 1994, Urology.

[19]  L. Leape,et al.  The appropriateness of use of coronary angiography in New York State. , 1993, JAMA.

[20]  C. Phelps,et al.  The methodologic foundations of studies of the appropriateness of medical care. , 1993, The New England journal of medicine.

[21]  E. Gilpin,et al.  Agreement in Human Interpretation of Analog Thallium Myocardial Perfusion Images , 1981, Circulation.

[22]  J. Murray,et al.  Variability in the Analysis of Coronary Arteriograms , 1977, Circulation.

[23]  A. Gittelsohn,et al.  Surgical Decision Making: The Reliability of Clinical Judgment , 1979, Annals of surgery.

[24]  M A Hlatky,et al.  Variation among hospitals in coronary-angiography practices and outcomes after myocardial infarction in a large health maintenance organization. , 1996, The New England journal of medicine.

[25]  J. Ferguson,et al.  Analysis of The National Institutes of Health Medicare Coverage Assessment , 1990, International Journal of Technology Assessment in Health Care.

[26]  N. Hicks,et al.  Some observations on attempts to measure appropriateness of care , 1994, BMJ.

[27]  Inter-observer variation in cytological and histological diagnoses of cervical neoplasia and its epidemiologic implication. , 1995, Journal of clinical epidemiology.

[28]  J. Birkmeyer,et al.  The cost‐effectiveness of preoperative autologous blood donation for total hip and knee replacement , 1993, Transfusion.

[29]  G. Colice Decision analysis, public health policy, and isoniazid chemoprophylaxis for young adult tuberculin skin reactors. , 1990, Archives of internal medicine.

[30]  N. Roos,et al.  Hysterectomy: variations in rates across small areas and across physicians' practices. , 1984, American journal of public health.

[31]  J. Birkmeyer,et al.  Cost-effectiveness of preoperative autologous donation in coronary artery bypass grafting. , 1994, The Annals of thoracic surgery.

[32]  P. Ridker,et al.  Discordance between Meta-analyses and Large-Scale Randomized, Controlled Trials: Examples from the Management of Acute Myocardial Infarction , 1995, Annals of Internal Medicine.

[33]  S. Haas,et al.  Variation in hysterectomy rates across small geographic areas of Massachusetts. , 1993, American journal of obstetrics and gynecology.