Measuring the Quality of Physician Practice by Using Clinical Vignettes: A Prospective Validation Study

Accurate, affordable, and valid measurements of clinical practice are the basis for quality-of-care assessments (1). However, to date, most measurement tools have relied on incomplete data sources, such as medical records or administrative data; require highly trained and expensive personnel to implement; and are difficult to validate (2-5). Comparisons of clinical practice across different sites and health care systems are also difficult because they require relatively complex instrument designs or statistical techniques to adjust for variations in case mix among the underlying patient populations (6, 7). We have developed a measurement tool, computerized clinical vignettes, that overcomes these limitations and measures physicians' clinical practice against a predefined set of explicit quality criteria. These vignettes simulate patient visits and can be given to physicians to measure their ability to evaluate, diagnose, and treat specific medical conditions. Each vignette-simulated case contains realistic clinical detail, allowing an identical clinical scenario to be presented to many physicians. Each physician can be asked to complete several vignettes to simulate diverse clinical conditions. This instrument design obviates the need to adjust quality scores for the variation in disease severity and comorbid conditions found in actual patient populations. Our vignettes are also distinct from other quality measurements of clinical practice because they do not focus on a single task, or even a limited set of tasks, but instead comprehensively evaluate the range of skills needed to care for a patient. Vignettes are particularly well-suited for quality assessments of clinical practice that are used for large-scale (8, 9), cross-system comparisons (10, 11) or for cases in which ethical issues preclude involving patients or their records (7, 12, 13). They are also ideal for evaluations that require holding patient variation constant (14, 15) or manipulating patient-level variables (15-17). The appeal of vignettes has resulted in their extensive use in medical school education (18, 19), as well as various studies that explicitly evaluate the quality of clinical practice in real-life settings and comparative analysis among national health care systems (10, 20-23). Before vignette-measured quality can be used confidently in these settings, however, 2 important questions must be answered: How valid are vignettes as a measure of actual clinical practice? Can vignettes discriminate among variations in the quality of clinical practice? This has led to a search to define a gold standard for validation (24-26). We and others have used standardized patients as this standard. Standardized patients are trained actors who present unannounced to outpatient clinics as patients with a given clinical condition. Immediately after meeting with a physician, the standardized patient records on a checklist what the physician did during the visit (26-28). Rigorous methods, which we have described in detail elsewhere (29), ensure that standardized patients can be considered a gold standard. In addition, we have demonstrated the validity of standardized patients as a gold standard by concealing audio recorders on standardized patients during visits. The overall rate of agreement between the standardized patients' checklists and the independent assessment of the audio transcripts was 91% (26). We previously used paper-and-pen vignettes in a study limited to only 1 health care system, the Veterans Administration, and found that they seemed to be a valid measure of the quality of clinical practice according to their rate of agreement with standardized patient checklists (26). For this study, we wanted to confirm the validity of vignettes by using a more complex study design that introduced many more randomly assigned physicians, a broader range of clinical cases, and several sites representing different health care systems. We also wanted to test a refined, computerized version of vignettes, which we believe are more realistic and streamline data collection and scoring. We were particularly interested in determining whether the vignettes accurately capture variation in the quality of clinical practice, which has become increasingly prominent in the national debate on quality of care (30, 31). We hoped that vignettes could contribute to this debate by providing a low-cost measure of variation across different health care systems. Methods Sites The study was conducted in 4 general internal medicine clinics: 2 Veterans Affairs (VA) medical centers and 2 large, private medical centers. One private site is a closed group model, and the other, primarily staffed by employed physicians, contracts with managed care plans. All sites are located in California, and each has an internal medicine residency training program. One VA medical center and 1 private site are located in 1 of 2 cities. The 2 VA medical centers are large, academically affiliated hospitals with large primary care general internal medicine practices. We chose the 2 private sites that were generally similar to the VA medical centers and to each other; each had large primary care practices and capitated reimbursement systems that provide primary care general internists with a broad scope of clinical decision-making authority. Study Design At each site, all attending physicians and second- and third-year residents who were actively engaged in the care of general internal medicine outpatients were eligible to participate in the study. We excluded only interns. Of 163 eligible physicians, 144 agreed to participate. We informed consenting physicians that 6 to 10 standardized patients might be introduced unannounced into their clinics over the course of a year and that they might be asked to complete an equal number of vignettes. Sixty physicians were randomly selected to see standardized patients: 5 physicians from each of the 3 training levels at each of the 4 sites (Figure 1). We assigned standardized patients to each selected physician for 8 clinical casessimple and complex cases of chronic obstructive pulmonary disease, diabetes, vascular disease, and depression. We abstracted the medical records from the 480 standardized patient visits. Each selected physician also completed a computerized clinical vignette for each of the 8 cases. For standardized patient visits that a selected physician did not complete, a replacement physician, who was randomly selected from the same training level at the same site, completed the visit. Eleven physicians required replacements. The 11 replacement physicians completed 24 standardized patient visits. Each replacement physician completed vignettes for all 8 cases. Finally, we randomly selected 45 additional physicians to serve as controls and complete vignettes (only) for all 8 cases. A total of 116 physicians participated in the study by seeing standardized patients, completing vignettes, or both. Standardized patients presented to the clinics between March and July 2000, and physicians completed vignettes between May and August 2000. Figure 1. Planned study design showing sites and physician sample by level of training and clinical case for the 3 quality measurement methods. Vignette Data Collection We developed the vignettes by using a standardized protocol. We first selected relatively common medical conditions frequently seen by internists. All selected conditions had explicit, evidence-based quality criteria and accepted standards of practice that could be used to score the vignettes, as well as be measured by standardized patients and chart abstraction. We developed written scenarios that described a typical patient with 1 of the same 4 diseases (chronic obstructive pulmonary disease, diabetes, vascular disease, or depression). For each disease, we developed a simple (uncomplicated) case and a more complex case with a comorbid condition of either hypertension or hypercholesterolemia. This yielded a total of 8 clinical cases. (A sample vignette and scoring sheet are available online.) Supplement. Appendix Figure: Vignette scoring sheet. Published online with permission from John W. Peabody, MD, PhD The physician completing the vignette sees the patient on a computer. Each vignette is organized into 5 sections, or domains, which, when completed in sequential order, recreate the normal sequence of events in an actual patient visit: taking the patient's history, performing the physical examination, ordering radiologic or laboratory tests, making a diagnosis, and administering a treatment plan. For example, the computerized vignette first states the presenting problem to the physician and prompts the physician to take the patient's history (that is, ask questions that would determine the history of the present illness; past medical history, including prevention; and social history). Physicians can record components of the history in any order without penalty. The entire format is open-ended: The physician enters the history questions directly into the computer and, in the most recent computerized versions, receives realtime responses. When the history is completed, the computer confirms that the physician has finished and then provides key responses typical of a patient with the specific case. The same process is repeated for the 4 remaining domains. In addition to the open-ended format, we have taken 3 steps to avoid potential inflation of vignette scores. First, physicians are not allowed to return to a previous domain and change their queries after they have seen the computerized response. Second, the number of queries is limited in the history and physical examination domains. For example, in the physical examination domain, physicians are asked to list only the 6 to 10 essential elements of the examination that they would perform. Third, they are given limited time to complete the vignette (just as time is limited during an actual patient visit)

[1]  R. Davis,et al.  Minor head injury in children: current management practices of pediatricians, emergency physicians, and family physicians. , 1998, Archives of pediatrics & adolescent medicine.

[2]  Carmel M. Martin,et al.  Chronic illness care as a balancing act. A qualitative study. , 2002, Australian family physician.

[3]  Alastair Baker,et al.  Crossing the Quality Chasm: A New Health System for the 21st Century , 2001, BMJ : British Medical Journal.

[4]  L. Berg,et al.  Cross-national interrater reliability of dementia diagnosis in the elderly and factors associated with disagreement , 1996, Neurology.

[5]  K. Weinfurt,et al.  Are psychiatrists' characteristics related to how they care for depression in the medically ill? Results from a national case-vignette survey. , 2001, Psychosomatics.

[6]  Jesse Green,et al.  How Accurate are Hospital Discharge Data for Evaluating Effectiveness of Care? , 1993, Medical care.

[7]  F Sturmans,et al.  Does competence of general practitioners predict their performance? Comparison between examination setting and actual practice. , 1991, BMJ.

[8]  J W Peabody,et al.  Comparison of vignettes, standardized patients, and chart abstraction: a prospective validation study of 3 methods for measuring quality. , 2000, JAMA.

[9]  P G Shekelle,et al.  Are nonspecific practice guidelines potentially harmful? A randomized comparison of the effect of nonspecific versus specific guidelines on physician decision making. , 2000, Health services research.

[10]  K. Gorter,et al.  Variation in diagnosis and management of common foot problems by GPs. , 2001, Family practice.

[11]  P. Glassman,et al.  Using standardized patients to measure quality: evidence from the literature and a prospective study. , 2000, The Joint Commission journal on quality improvement.

[12]  R. Nordyke Determinants of PHC productivity and resource utilization: a comparison of public and private physicians in Macedonia. , 2002, Health policy.

[13]  P. Carney,et al.  Using unannounced standardized patients to assess the HIV preventive practices of family nurse practitioners and family physicians. , 1998, The Nurse practitioner.

[14]  G. Guyatt,et al.  Variability in physicians' decisions on caring for chronically ill elderly patients: an international study. , 1991, CMAJ : Canadian Medical Association journal = journal de l'Association medicale canadienne.

[15]  S. Jain,et al.  Assessing the Accuracy of Administrative Data in Health Information Systems , 2004, Medical care.

[16]  J P Rissing,et al.  Physician and coding errors in patient records. , 1985, JAMA.

[17]  D. McClish,et al.  An International Comparison of Physicians' Judgments of Outcome Rates of Cardiac Procedures and Attitudes toward Risk, Uncertainty, Justifiability, and Regret , 1998, Medical decision making : an international journal of the Society for Medical Decision Making.

[18]  Elizabeth A McGlynn,et al.  There is no perfect health system. , 2004, Health affairs.

[19]  L. Davies,et al.  Laboratory expenditure in Pegasus Medical Group: a comparison of high and low users of laboratory tests with academics. , 2000, The New Zealand medical journal.

[20]  D Spruijt-Metz,et al.  Variation in diagnoses: influence of specialists' training on selecting and ranking relevant information in geriatric case vignettes. , 1996, Social science & medicine.

[21]  N. Powe,et al.  Relation between pediatric experience and treatment recommendations for children and adolescents with kidney failure. , 2001, JAMA.

[22]  A. Alterman,et al.  The use of case vignettes for Addiction Severity Index training. , 1997, Journal of substance abuse treatment.

[23]  A. Lawthers,et al.  Designing and using measures of quality based on physician office records , 1995, The Journal of ambulatory care management.

[24]  A. Rosen,et al.  The importance of severity of illness adjustment in predicting adverse outcomes in the Medicare population. , 1995, Journal of clinical epidemiology.

[25]  C. Gordon,et al.  An examination of the attitudes and practice of general practitioners in the diagnosis and treatment of depression in older people , 2002, International journal of geriatric psychiatry.

[26]  L. Shields,et al.  Qualitative analysis of the care of children in hospital in four countries-Part 1. , 2001, Journal of pediatric nursing.

[27]  H. Sandvik Criterion validity of responses to patient vignettes: an analysis based on management of female urinary incontinence. , 1995, Family medicine.

[28]  J. Avorn,et al.  Clinical decision-making in the evaluation and treatment of insomnia. , 1990, The American journal of medicine.

[29]  K. Svärdsudd,et al.  Variations in sick-listing practice among male and female physicians of different specialities based on case vignettes. , 2000, Scandinavian journal of primary health care.

[30]  P. Brann,et al.  Routine Outcome Measurement in a Child and Adolescent Mental Health Service: An Evaluation of HoNOSCA , 2001, The Australian and New Zealand journal of psychiatry.

[31]  P Glassman,et al.  How well does chart abstraction measure quality? A prospective comparison of standardized patients with the medical record. , 2000, The American journal of medicine.

[32]  J. Muñoz,et al.  Using vignettes to compare the quality of clinical care variation in economically divergent countries. , 2004, Health services research.

[33]  A. Stillman,et al.  Are critically ill older patients treated differently than similarly ill younger patients? , 1998, The Western journal of medicine.

[34]  F. Degruy,et al.  Stability of standardized patients' performance in a study of clinical decision making. , 1995, Family medicine.

[35]  B. Fowers,et al.  His and her individualisms? Sex bias and individualism in psychologists' responses to case vignettes. , 1996, The Journal of psychology.

[36]  J. Luck,et al.  Using standardised patients to measure physicians' practice: validation study using audio recordings , 2002, BMJ : British Medical Journal.

[37]  S. Hazelett,et al.  Patients' behavior at the time of injury: effect on nurses' perception of pain level and subsequent treatment. , 2002, Pain management nursing : official journal of the American Society of Pain Management Nurses.

[38]  J. Luck,et al.  Measuring compliance with preventive care guidelines: standardized patients, clinical vignettes, and the medical record. , 2000, Journal of general internal medicine.

[39]  M. Hugo Mental health professionals' attitudes towards people who have experienced a mental health disorder. , 2001, Journal of psychiatric and mental health nursing.

[40]  E Martin,et al.  To what extent do clinical notes by general practitioners reflect actual medical performance? A study using simulated patients. , 1994, The British journal of general practice : the journal of the Royal College of General Practitioners.

[41]  R. Dickson,et al.  Hormonal side effects in women: typical versus atypical antipsychotic treatment. , 2000, The Journal of clinical psychiatry.

[42]  T. Morita,et al.  Practices and attitudes of Japanese oncologists and palliative care physicians concerning terminal sedation: a nationwide survey. , 2002, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[43]  J. Chappel Educational approaches to prescribing practices and substance abuse. , 1991, Journal of psychoactive drugs.

[44]  M. Cornfeld,et al.  Accuracy of cancer-risk assessment in primary care practice. , 2001, Journal of cancer education : the official journal of the American Association for Cancer Education.

[45]  T. Quinn,et al.  Determining patients' suitability for thrombolysis: coronary care nurses' agreement with an expert cardiological 'gold standard' as assessed by clinical and electrocardiographic 'vignettes'. , 1998, Intensive & critical care nursing.

[46]  E. McGlynn,et al.  The quality of health care delivered to adults in the United States. , 2003, The New England journal of medicine.

[47]  M. Nendaz,et al.  The Patient Findings Questionnaire: one solution to an important standardized patient examination problem. , 1999, Academic medicine : journal of the Association of American Medical Colleges.

[48]  D. Gould Using vignettes to collect data for nursing research studies: how valid are the findings? , 1996, Journal of clinical nursing.

[49]  N. Farber,et al.  Residents' prescription writing for nonpatients. , 2002, JAMA.

[50]  M Wilkes,et al.  A Windows-based tool for the study of clinical decision-making. , 1995, Medinfo. MEDINFO.

[51]  Michael F. Green,et al.  Factors affecting reliability and confidence of DSM-III-R psychosis-related diagnosis , 2001, Psychiatry Research.

[52]  M. Mayo-Smith,et al.  Differences in Generalists' and Cardiologists' Perceptions of Cardiovascular Risk and the Outcomes of Preventive Therapy in Cardiovascular Disease , 1996, Annals of Internal Medicine.

[53]  J. Bring,et al.  How do GPs use clinical information in their judgements of heart failure? A clinical judgement analysis study. , 1998, Scandinavian journal of primary health care.

[54]  S. Fihn The quest to quantify quality. , 2000, JAMA.

[55]  L. Muhe,et al.  Quality of hospital care for seriously ill children in less-developed countries , 2001, The Lancet.

[56]  J. Rethans,et al.  Assessment of practicing family physicians: comparison of observation in a multiple-station examination using standardized patients with observation of consultations in daily practice. , 1999, Academic medicine : journal of the Association of American Medical Colleges.

[57]  J D Kleinke,et al.  Release 0.0: clinical information technology in the real world. , 1998, Health affairs.

[58]  O. Hnatiuk,et al.  Do specialists differ on do-not-resuscitate decisions? , 2002, Chest.

[59]  M. Huby,et al.  The application of vignettes in social and nursing research. , 2002, Journal of advanced nursing.