Accuracy of Electronic Health Record Data for Identifying Stroke Cases in Large-Scale Epidemiological Studies: A Systematic Review from the UK Biobank Stroke Outcomes Group

Objective Long-term follow-up of population-based prospective studies is often achieved through linkages to coded regional or national health care data. Our knowledge of the accuracy of such data is incomplete. To inform methods for identifying stroke cases in UK Biobank (a prospective study of 503,000 UK adults recruited in middle-age), we systematically evaluated the accuracy of these data for stroke and its main pathological types (ischaemic stroke, intracerebral haemorrhage, subarachnoid haemorrhage), determining the optimum codes for case identification. Methods We sought studies published from 1990-November 2013, which compared coded data from death certificates, hospital admissions or primary care with a reference standard for stroke or its pathological types. We extracted information on a range of study characteristics and assessed study quality with the Quality Assessment of Diagnostic Studies tool (QUADAS-2). To assess accuracy, we extracted data on positive predictive values (PPV) and—where available—on sensitivity, specificity, and negative predictive values (NPV). Results 37 of 39 eligible studies assessed accuracy of International Classification of Diseases (ICD)-coded hospital or death certificate data. They varied widely in their settings, methods, reporting, quality, and in the choice and accuracy of codes. Although PPVs for stroke and its pathological types ranged from 6–97%, appropriately selected, stroke-specific codes (rather than broad cerebrovascular codes) consistently produced PPVs >70%, and in several studies >90%. The few studies with data on sensitivity, specificity and NPV showed higher sensitivity of hospital versus death certificate data for stroke, with specificity and NPV consistently >96%. Few studies assessed either primary care data or combinations of data sources. Conclusions Particular stroke-specific codes can yield high PPVs (>90%) for stroke/stroke types. Inclusion of primary care data and combining data sources should improve accuracy in large epidemiological studies, but there is limited published information about these strategies.

[1]  P. Elliott,et al.  UK Biobank: An Open Access Resource for Identifying the Causes of a Wide Range of Complex Diseases of Middle and Old Age , 2015, PLoS medicine.

[2]  Diane Lacaille,et al.  Validity of Myocardial Infarction Diagnoses in Administrative Databases: A Systematic Review , 2014, PloS one.

[3]  Debra Butt,et al.  Validity of administrative data for identifying patients who have had a stroke or transient ischemic attack using EMRALD as a reference standard. , 2013, The Canadian journal of cardiology.

[4]  P. Rothwell,et al.  Impact of Completeness of Ascertainment of Minor Stroke on Stroke Incidence: Implications for Ideal Study Methods , 2013, Stroke.

[5]  M. Wallander,et al.  Incidence of hemorrhagic stroke in the general population: validation of data from The Health Improvement Network , 2013, Pharmacoepidemiology and drug safety.

[6]  B. Stegmayr,et al.  Refinement of Swedish Administrative Registers to Monitor Stroke Events on the National Level , 2013, Neuroepidemiology.

[7]  Bernadette A. Thomas,et al.  Global and regional mortality from 235 causes of death for 20 age groups in 1990 and 2010: a systematic analysis for the Global Burden of Disease Study 2010 , 2012, The Lancet.

[8]  NorbertNighoghossian,et al.  Can Hospital Discharge Databases Be Used to Follow Ischemic Stroke Incidence , 2013 .

[9]  V. Beral,et al.  Vascular disease in women: comparison of diagnoses in hospital episode statistics and general practice records in England , 2012, BMC Medical Research Methodology.

[10]  M. Hommel,et al.  How accurate is the reporting of stroke in hospital discharge data? A pilot validation study using a population-based stroke registry as control , 2012, Journal of Neurology.

[11]  P. Ziprin,et al.  Systematic review of discharge coding accuracy. , 2012, Journal of public health.

[12]  J. Gurwitz,et al.  A systematic review of validated methods for identifying cerebrovascular accident or transient ischemic attack using administrative data , 2012, Pharmacoepidemiology and drug safety.

[13]  Susan Mallett,et al.  QUADAS-2: A Revised Tool for the Quality Assessment of Diagnostic Accuracy Studies , 2011, Annals of Internal Medicine.

[14]  J. McNeil,et al.  Accuracy of national mortality codes in identifying adjudicated cardiovascular deaths , 2011, Australian and New Zealand journal of public health.

[15]  A. Térent,et al.  Validation of the Swedish inpatient and cause‐of‐death registers in the context of stroke , 2011, Acta neurologica Scandinavica.

[16]  T. Stukel,et al.  Importance of accurately identifying disease in studies using electronic health records , 2010, BMJ : British Medical Journal.

[17]  S. Yusuf,et al.  Risk factors for ischaemic and intracerebral haemorrhagic stroke in 22 countries (the INTERSTROKE study): a case-control study , 2010, The Lancet.

[18]  Ana Ruigómez,et al.  Validation of ischemic cerebrovascular diagnoses in the health improvement network (THIN) , 2010, Pharmacoepidemiology and drug safety.

[19]  Sebastian Schneeweiss,et al.  Validation of claims‐based diagnostic and procedure codes for cardiovascular and gastrointestinal serious adverse events in a commercially‐insured population , 2010, Pharmacoepidemiology and drug safety.

[20]  C. Sudlow,et al.  Differing Risk Factor Profiles of Ischemic Stroke Subtypes: Evidence for a Distinct Lacunar Arteriopathy? , 2010, Stroke.

[21]  P. Rose,et al.  Validity of diagnostic coding within the General Practice Research Database: a systematic review. , 2010, The British journal of general practice : the journal of the Royal College of General Practitioners.

[22]  L. Smeeth,et al.  Validation and validity of diagnoses in the General Practice Research Database: a systematic review , 2010, British journal of clinical pharmacology.

[23]  A. Doney,et al.  Automated data capture from free‐text radiology reports to enhance accuracy of hospital inpatient stroke codes , 2010, Pharmacoepidemiology and drug safety.

[24]  A. Mendelow,et al.  The accuracy of hospital discharge coding for hemorrhagic stroke. , 2009, Acta neurologica Belgica.

[25]  David C. Anderson,et al.  Stroke rates: 1980-2000: the Minnesota Stroke Survey. , 2009, American journal of epidemiology.

[26]  D. Jacobs,et al.  The Minnesota Stroke Survey , 2009 .

[27]  R. Luben,et al.  Accuracy of death certification and hospital record linkage for identification of incident stroke , 2008, BMC medical research methodology.

[28]  Jordi Castellsague,et al.  Validation of ICD‐9 codes with a high positive predictive value for incident strokes resulting in hospitalization using Medicaid health data , 2008, Pharmacoepidemiology and drug safety.

[29]  Muin J. Khoury,et al.  Quantifying realistic sample size requirements for human genome epidemiology , 2008 .

[30]  G. Cesana,et al.  The Italian Register of Cardiovascular Diseases: Attack Rates and Case Fatality for Cerebrovascular Events , 2007, Cerebrovascular Diseases.

[31]  V. Salomaa,et al.  The validation of the Finnish Hospital Discharge Register and Causes of Death Register data on stroke diagnoses , 2007, European journal of cardiovascular prevention and rehabilitation : official journal of the European Society of Cardiology, Working Groups on Epidemiology & Prevention and Cardiac Rehabilitation and Exercise Physiology.

[32]  T. Truelsen,et al.  Validity of Stroke Diagnoses in a National Register of Patients , 2007, Neuroepidemiology.

[33]  G. Friedman,et al.  Risk of Hemorrhagic Stroke in Asian American Ethnic Groups , 2005, Neuroepidemiology.

[34]  J. Wardlaw,et al.  Early signs of brain infarction at CT: observer reliability and outcome after thrombolytic treatment--systematic review. , 2005, Radiology.

[35]  Ugo Fedeli,et al.  Measuring Accuracy of Discharge Diagnoses for a Region-Wide Surveillance of Hospitalized Strokes , 2005, Stroke.

[36]  F. Monaco,et al.  Accuracy of the ICD-9 codes for identifying TIA and stroke in an Italian automated database , 2004, Neurological Sciences.

[37]  Peter Croft,et al.  Quality of morbidity coding in general practice computerized medical records: a systematic review. , 2004, Family practice.

[38]  S. Gutnikov,et al.  Change in stroke incidence, mortality, case-fatality, severity, and risk factors in Oxfordshire, UK from 1981 to 2004 (Oxford Vascular Study) , 2004, The Lancet.

[39]  P. Rothwell,et al.  Differences in Vascular Risk Factors Between Etiological Subtypes of Ischemic Stroke: Importance of Population-Based Studies , 2003, Stroke.

[40]  J. Mant,et al.  Identification of stroke in the community: a comparison of three methods. , 2003, The British journal of general practice : the journal of the Royal College of General Practitioners.

[41]  R. Rinaldi,et al.  Accuracy of ICD-9 codes in identifying ischemic stroke in the General Hospital of Lugo di Romagna (Italy) , 2003, Neurological Sciences.

[42]  K. Thiru,et al.  Systematic review of scope and quality of electronic patient record data in primary care , 2003, BMJ : British Medical Journal.

[43]  A. Toniato,et al.  Transient ischemic attack--proposed new definition. , 2003, The New England journal of medicine.

[44]  A. Térent,et al.  Case ascertainment in stroke studies: the risk of selection bias , 2003, Acta neurologica Scandinavica.

[45]  Alan D. Lopez,et al.  The Global Burden of Disease Study , 2003 .

[46]  J. Mohr,et al.  Transient ischemic attack--proposal for a new definition. , 2002, The New England journal of medicine.

[47]  W. Longstreth,et al.  Validating Administrative Data in Stroke Research , 2002, Stroke.

[48]  W. Hacke,et al.  CT and Diffusion-Weighted MR Imaging in Randomized Order: Diffusion-Weighted Imaging Results in Higher Accuracy and Lower Interrater Variability in the Diagnosis of Hyperacute Ischemic Stroke , 2002, Stroke.

[49]  A. Tjønneland,et al.  Predictive value of stroke and transient ischemic attack discharge diagnoses in The Danish National Registry of Patients. , 2002, Journal of clinical epidemiology.

[50]  H A Feldman,et al.  Possible Effect of DRGs on the Classification of Stroke: Implications for Epidemiological Surveillance , 2001, Stroke.

[51]  L. Brown,et al.  Interval Estimation for a Binomial Proportion , 2001 .

[52]  P. Duncan,et al.  Stroke: who's counting what? , 2001, Journal of rehabilitation research and development.

[53]  C. Derby,et al.  Trends in validated cases of fatal and nonfatal stroke, stroke classification, and risk factors in southeastern New England, 1980 to 1991 : data from the Pawtucket Heart Health Program. , 2000, Stroke.

[54]  B. Reeder,et al.  Validity of Stroke Diagnosis on Hospital Discharge Records in Saskatchewan, Canada: Implications for Stroke Surveillance , 1999, Cerebrovascular Diseases.

[55]  A. Folsom,et al.  Stroke incidence and survival among middle-aged adults: 9-year follow-up of the Atherosclerosis Risk in Communities (ARIC) cohort. , 1999, Stroke.

[56]  H. Ellekjær,et al.  Identification of incident stroke in Norway: hospital discharge data compared with a population-based stroke register. , 1999, Stroke.

[57]  L. Goldstein Accuracy of ICD-9-CM coding for the identification of patients with acute ischemic stroke: effect of modifier codes. , 1998, Stroke.

[58]  M. Petticrew,et al.  Assessment of the reproducibility of clinical coding in routinely collected hospital activity data: a study in two hospitals. , 1998, Journal of public health medicine.

[59]  P. Duncan,et al.  Inaccuracy of the International Classification of Diseases (ICD-9-CM) in identifying the diagnosis of ischemic cerebrovascular disease , 1997, Neurology.

[60]  M. Giroud,et al.  A hospital-based and a population-based stroke registry yield different results: the experience in Dijon, France. , 1997, Neuroepidemiology.

[61]  J. Ferro,et al.  Diagnosis of transient ischemic attack by the nonneurologist. A validation study. , 1996, Stroke.

[62]  C. Warlow,et al.  Correcting outcome data for case mix in stroke medicine , 1996 .

[63]  C. Warlow,et al.  The accuracy of Scottish Morbidity Record (SMR1) data for identifying hospitalised stroke patients. , 1996, Health bulletin.

[64]  H F Sanderson,et al.  A language of health in action: Read Codes, classifications and groupings. , 1996, Proceedings : a conference of the American Medical Informatics Association. AMIA Fall Symposium.

[65]  R. Meara,et al.  The quality of diagnostic coding in cerebrovascular disease. , 1995, International journal for quality in health care : journal of the International Society for Quality in Health Care.

[66]  L H Kuller,et al.  Surveillance and ascertainment of cardiovascular events. The Cardiovascular Health Study. , 1995, Annals of epidemiology.

[67]  J. Whisnant,et al.  Accuracy of Hospital Discharge Abstracts for Identifying Stroke , 1994, Stroke.

[68]  L. Abenhaim,et al.  The quality of information recorded on a UK database of primary care records: A study of hospitalizations due to hypoglycemia and other conditions , 1994 .

[69]  J. Potter,et al.  The Accuracy of Diagnostic Classification of Cerebrovascular Disease (Cvd) , 1993 .

[70]  B. Stegmayr,et al.  Measuring stroke in the population: quality of routine statistics in comparison with a population-based stroke registry. , 1992, Neuroepidemiology.

[71]  Louette R. Johnson Lutjens Research , 2006 .

[72]  H. Freyberger,et al.  Interrater Reliability in the Assessment of Neurovascular Diseases , 1991 .

[73]  J. Chisholm,et al.  The Read clinical classification. , 1990, BMJ.

[74]  P. Sandercock,et al.  Why are patients with acute stroke admitted to hospital? , 1986, British medical journal.

[75]  S. Hatano,et al.  Experience from a multicentre stroke register: a preliminary report. , 1976, Bulletin of the World Health Organization.

[76]  S. Pocock,et al.  Incidence , , 2018 .