Defining Disease Phenotypes Using National Linked Electronic Health Records: A Case Study of Atrial Fibrillation

Background National electronic health records (EHR) are increasingly used for research but identifying disease cases is challenging due to differences in information captured between sources (e.g. primary and secondary care). Our objective was to provide a transparent, reproducible model for integrating these data using atrial fibrillation (AF), a chronic condition diagnosed and managed in multiple ways in different healthcare settings, as a case study. Methods Potentially relevant codes for AF screening, diagnosis, and management were identified in four coding systems: Read (primary care diagnoses and procedures), British National Formulary (BNF; primary care prescriptions), ICD-10 (secondary care diagnoses) and OPCS-4 (secondary care procedures). From these we developed a phenotype algorithm via expert review and analysis of linked EHR data from 1998 to 2010 for a cohort of 2.14 million UK patients aged ≥30 years. The cohort was also used to evaluate the phenotype by examining associations between incident AF and known risk factors. Results The phenotype algorithm incorporated 286 codes: 201 Read, 63 BNF, 18 ICD-10, and four OPCS-4. Incident AF diagnoses were recorded for 72,793 patients, but only 39.6% (N = 28,795) were recorded in primary care and secondary care. An additional 7,468 potential cases were inferred from data on treatment and pre-existing conditions. The proportion of cases identified from each source differed by diagnosis age; inferred diagnoses contributed a greater proportion of younger cases (≤60 years), while older patients (≥80 years) were mainly diagnosed in SC. Associations of risk factors (hypertension, myocardial infarction, heart failure) with incident AF defined using different EHR sources were comparable in magnitude to those from traditional consented cohorts. Conclusions A single EHR source is not sufficient to identify all patients, nor will it provide a representative sample. Combining multiple data sources and integrating information on treatment and comorbid conditions can substantially improve case identification.

[1]  G. Hripcsak,et al.  A statistical dynamics approach to the study of human health data: resolving population scale diurnal variation in laboratory data. , 2010, Physics letters. A.

[2]  Damian Smedley,et al.  The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data , 2014, Nucleic Acids Res..

[3]  P Michael Ho,et al.  Impaired heart rate recovery is associated with new-onset atrial fibrillation: a prospective cohort study , 2009, BMC cardiovascular disorders.

[4]  A. Camm,et al.  ‘2012 focused update of the ESC Guidelines for the management of atrial fibrillation’ [Eur Heart J (2012); 33(21):2719–2747] , 2013 .

[5]  M. Cowie,et al.  Atrial fibrillation: improvement in identification and stroke preventive therapy - data from the UK Clinical Practice Research Datalink, 2000-2012. , 2014, International journal of cardiology.

[6]  Melissa A. Basford,et al.  Robust replication of genotype-phenotype associations across multiple diseases in an electronic medical record. , 2010, American journal of human genetics.

[7]  Serguei V. S. Pakhomov,et al.  Epidemiology of angina pectoris: role of natural language processing of the medical record. , 2007, American heart journal.

[8]  George Hripcsak,et al.  Next-generation phenotyping of electronic health records , 2012, J. Am. Medical Informatics Assoc..

[9]  J. McMurray,et al.  Population prevalence, incidence, and predictors of atrial fibrillation in the Renfrew/Paisley study , 2001, Heart.

[10]  Melissa A. Basford,et al.  The Electronic Medical Records and Genomics (eMERGE) Network: past, present, and future , 2013, Genetics in Medicine.

[11]  Liam Smeeth,et al.  Oral Bisphosphonates and Risk of Atrial Fibrillation and Flutter in Women: A Self-Controlled Case-Series Safety Analysis , 2009, PloS one.

[12]  D. Levy,et al.  Long-term alcohol consumption and the risk of atrial fibrillation in the Framingham Study. , 2004, The American journal of cardiology.

[13]  C. Albert,et al.  Risk of death and cardiovascular events in initially healthy women with new-onset atrial fibrillation. , 2011, JAMA.

[14]  K. Liestøl,et al.  Importance of physical fitness on predictive effect of body mass index and weight gain on incident atrial fibrillation in healthy middle-age men. , 2012, The American journal of cardiology.

[15]  D. Levy,et al.  Development of a risk score for atrial fibrillation (Framingham Heart Study): a community-based cohort study , 2009, The Lancet.

[16]  Ana Ruigómez,et al.  Incidence of chronic atrial fibrillation in general practice and its treatment pattern. , 2002, Journal of clinical epidemiology.

[17]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[18]  George Hripcsak,et al.  Temporal Properties of Diagnosis Code Time Series in Aggregate , 2013, IEEE Journal of Biomedical and Health Informatics.

[19]  Albert Hofman,et al.  Cigarette smoking and risk of atrial fibrillation: the Rotterdam Study. , 2008, American heart journal.

[20]  C. Chute,et al.  Electronic Medical Records for Genetic Research: Results of the eMERGE Consortium , 2011, Science Translational Medicine.

[21]  Melissa A. Basford,et al.  Validation of electronic medical record-based phenotyping algorithms: results and lessons learned from the eMERGE network. , 2013, Journal of the American Medical Informatics Association : JAMIA.

[22]  E. Mathiesen,et al.  Palpitations are predictive of future atrial fibrillation. An 11-year follow-up of 22,815 men and women: the Tromsø Study , 2013, European journal of preventive cardiology.

[23]  Michael J Pencina,et al.  Validation of an atrial fibrillation risk algorithm in whites and African Americans. , 2010, Archives of internal medicine.

[24]  H Putter,et al.  Tutorial in biostatistics: competing risks and multi‐state models , 2007, Statistics in medicine.

[25]  George Hripcsak,et al.  Exploiting time in electronic health record correlations , 2011, J. Am. Medical Informatics Assoc..

[26]  Cui Tao,et al.  Building a robust, scalable and standards-driven infrastructure for secondary use of EHR data: The SHARPn project , 2012, J. Biomed. Informatics.

[27]  J. Plumb,et al.  Risks of stroke and mortality associated with suboptimal anticoagulation in atrial fibrillation patients , 2011, Thrombosis and Haemostasis.

[28]  O. Melander,et al.  Increased risk of atrial fibrillation in patients with coeliac disease: a nationwide cohort study. , 2011, European heart journal.

[29]  R. de Caterina,et al.  Long-term use of anti-inflammatory drugs and risk of atrial fibrillation. , 2010, Archives of internal medicine.

[30]  George Hripcsak,et al.  Temporal trends of hemoglobin A1c testing , 2014, J. Am. Medical Informatics Assoc..

[31]  G. Jensen,et al.  Rising Rates of Hospital Admissions for Atrial Fibrillation , 2003, Epidemiology.

[32]  R. Collins What makes UK Biobank special? , 2012, The Lancet.

[33]  L. Skov,et al.  Psoriasis and risk of atrial fibrillation and ischaemic stroke: a Danish Nationwide Cohort Study. , 2011, European heart journal.

[34]  John Shawe-Taylor,et al.  Extracting Diagnoses and Investigation Results from Unstructured Text in Electronic Health Records by Semi-Supervised Machine Learning , 2012, PloS one.

[35]  Rahul Wadke,et al.  Atrial fibrillation. , 2022, Disease-a-month : DM.

[36]  George Hripcsak,et al.  Using time-delayed mutual information to discover and interpret temporal correlation structure in complex populations , 2011, Chaos.

[37]  O. Melander,et al.  Orthostatic hypotension and long‐term incidence of atrial fibrillation: the malmö preventive project , 2010, Journal of internal medicine.

[38]  O. Melander,et al.  Atrial fibrillation in the Malmö diet and cancer study: a study of occurrence, risk factors and diagnostic validity , 2010, European Journal of Epidemiology.

[39]  V. Allgar,et al.  Identifying patients with a cancer diagnosis using general practice medical records and Cancer Registry data. , 2008, Family practice.

[40]  M. Fava,et al.  Using electronic medical records to enable large-scale studies in psychiatry: treatment resistant depression as a model , 2011, Psychological Medicine.

[41]  Spiros C. Denaxas,et al.  Completeness and diagnostic validity of recording acute myocardial infarction events in primary care, hospital care, disease registry, and national mortality records: cohort study , 2013, BMJ.

[42]  Susanne Rosthøj,et al.  Competing risks as a multi-state model , 2002, Statistical methods in medical research.

[43]  李永军,et al.  Atrial Fibrillation , 1999 .

[44]  Euan A Ashley,et al.  Electrocardiographic predictors of atrial fibrillation. , 2009, American heart journal.

[45]  Jennifer G. Robinson,et al.  Electronic health records based phenotyping in next-generation clinical trials: a perspective from the NIH Health Care Systems Collaboratory. , 2013, Journal of the American Medical Informatics Association : JAMIA.

[46]  Wendy A. Wolf,et al.  The eMERGE Network: A consortium of biorepositories linked to electronic medical records data for conducting genomic studies , 2011, BMC Medical Genomics.

[47]  D. Lane,et al.  A comparison of risk stratification schemes for stroke in 79 884 atrial fibrillation patients in general practice , 2011, Journal of thrombosis and haemostasis : JTH.

[48]  Greta Rait,et al.  Optimising Use of Electronic Health Records to Describe the Presentation of Rheumatoid Arthritis in Primary Care: A Strategy for Developing Code Lists , 2013, PloS one.

[49]  Gregory Y H Lip,et al.  Prevalence of atrial fibrillation in the general population and in high-risk groups: the ECHOES study. , 2012, Europace : European pacing, arrhythmias, and cardiac electrophysiology : journal of the working groups on cardiac pacing, arrhythmias, and cardiac cellular electrophysiology of the European Society of Cardiology.

[50]  D. Roden,et al.  Race-specific impact of atrial fibrillation risk factors in blacks and whites in the southern community cohort study. , 2012, The American journal of cardiology.

[51]  James B Seward,et al.  Left ventricular diastolic dysfunction as a predictor of the first diagnosed nonvalvular atrial fibrillation in 840 elderly men and women. , 2002, Journal of the American College of Cardiology.

[52]  I. Kohane,et al.  Finding the missing link for big biomedical data. , 2014, JAMA.

[53]  Harry Hemingway,et al.  Blood pressure and incidence of twelve cardiovascular diseases: lifetime risks, healthy life-years lost, and age-specific associations in 1·25 million people , 2014, The Lancet.

[54]  Lin Chen,et al.  Importance of multi-modal approaches to effectively identify cataract cases from electronic health records , 2012, J. Am. Medical Informatics Assoc..

[55]  Yuji Okura,et al.  ST-segment abnormalities and premature complexes are predictors of new-onset atrial fibrillation: the Niigata preventive medicine study. , 2006, American heart journal.

[56]  David A Fitzmaurice,et al.  Screening versus routine practice in detection of atrial fibrillation in patients aged 65 or over: cluster randomised controlled trial , 2007, BMJ : British Medical Journal.

[57]  Dipak Kalra,et al.  Data Resource Profile: Cardiovascular disease research using linked bespoke studies and electronic health records (CALIBER) , 2012, International journal of epidemiology.

[58]  Jeroen J. Bax,et al.  2012 focused update of the ESC Guidelines for the management of atrial fibrillation: an update of the 2010 ESC Guidelines for the management of atrial fibrillation. Developed with the special contribution of the European Heart Rhythm Association. , 2012, European heart journal.

[59]  J. Hodgkinson,et al.  Treatment pathways for patients with atrial fibrillation , 2012, International journal of clinical practice.

[60]  Carlos Martínez,et al.  The freetext matching algorithm: a computer program to extract diagnoses and causes of death from unstructured text in electronic health records , 2012, BMC Medical Informatics and Decision Making.

[61]  G. Hripcsak,et al.  Correlating electronic health record concepts with healthcare process events , 2013, Journal of the American Medical Informatics Association : JAMIA.

[62]  D.,et al.  Regression Models and Life-Tables , 2022 .

[63]  J. Olesen,et al.  The spectrum of thyroid disease and risk of new onset atrial fibrillation: a large population cohort study , 2012, BMJ : British Medical Journal.

[64]  P. Vestergaard,et al.  Alcohol and risk of atrial fibrillation or flutter: a cohort study. , 2004, Archives of internal medicine.

[65]  Tzeng-Ji Chen,et al.  Herpes simplex virus infection and risk of atrial fibrillation: a nationwide study. , 2013, International journal of cardiology.

[66]  George Hripcsak,et al.  A collaborative approach to developing an electronic health record phenotyping algorithm for drug-induced liver injury. , 2013, Journal of the American Medical Informatics Association : JAMIA.

[67]  L H Kuller,et al.  Incidence of and risk factors for atrial fibrillation in older adults. , 1997, Circulation.

[68]  N. Steel,et al.  The Quality and Outcomes Framework—where next? , 2013, BMJ : British Medical Journal.

[69]  Gerhard Hindricks,et al.  2012 focused update of the ESC Guidelines for the management of atrial fibrillation: an update of the 2010 ESC Guidelines for the management of atrial fibrillation--developed with the special contribution of the European Heart Rhythm Association. , 2012, Europace : European pacing, arrhythmias, and cardiac electrophysiology : journal of the working groups on cardiac pacing, arrhythmias, and cardiac cellular electrophysiology of the European Society of Cardiology.

[70]  Elsayed Z Soliman,et al.  Incidence of atrial fibrillation in whites and African-Americans: the Atherosclerosis Risk in Communities (ARIC) study. , 2009, American heart journal.

[71]  Paulus Kirchhof,et al.  Atrial fibrillation guidelines across the Atlantic: a comparison of the current recommendations of the European Society of Cardiology/European Heart Rhythm Association/European Association of Cardiothoracic Surgeons, the American College of Cardiology Foundation/American Heart Association/Heart Rhyt , 2013, European heart journal.