Linking electronic health records for research on a nationwide cohort including over 54 million people in England

Objectives: Describe a new England-wide electronic health record (EHR) resource enabling whole population research on Covid-19 and cardiovascular disease whilst ensuring data security and privacy and maintaining public trust. Design: Cohort comprising linked person-level records from national healthcare settings for the English population accessible within the new NHS Digital Trusted Research Environment. Setting: EHRs from primary care, hospital episodes, death registry, Covid-19 laboratory test results and community dispensing data, with further enrichment planned from specialist intensive care, cardiovascular and Covid-19 vaccination data. Participants: 54.4 million people alive on 1st January 2020 and registered with an NHS general practitioner in England. Main measures of interest: Confirmed and suspected Covid-19 diagnoses, exemplar cardiovascular conditions (incident stroke or transient ischaemic attack (TIA) and incident myocardial infarc-tion (MI)) and all-cause mortality between 1st January and 31st October 2020. Results: The linked cohort includes over 96% of the English population. By combining person-level data across national healthcare settings, data on age, sex and ethnicity are complete for over 95% of the population. Among 53.2M people with no prior diagnosis of stroke/TIA, 98,721 had an incident stroke/TIA, of which 30% were recorded only in primary care and 4% only in death registry records. Among 53.1M people with no prior history of MI, 62,966 had an incident MI, of which 8% were recorded only in primary care and 12% only in death records. A total of 959,067 people had a confirmed or suspected Covid-19 diagnosis (714,162 in primary care data, 126,349 in hospital admission records, 776,503 in Covid-19 laboratory test data and 48,433 participants in death regis-try records). While 58% of these were recorded in both primary care and Covid-19 laboratory test data, 15% and 18% respectively were recorded in only one. Conclusions: This population-wide resource demonstrates the importance of linking person-level data across health settings to maximize completeness of key characteristics and to ascertain cardiovascular events and Covid-19 diagnoses. Although established initially to support research on Covid-19 and cardiovascular disease to benefit clinical care and public health and to inform health care policy, it can broaden further to enable a very wide range of research.

[1]  C. Fischbacher,et al.  Risks of and risk factors for COVID-19 disease in people with diabetes: a cohort study of the total population of Scotland , 2020, The Lancet Diabetes & Endocrinology.

[2]  C. Fischbacher,et al.  Risk of hospital admission with coronavirus disease 2019 in healthcare workers and their households: nationwide linkage cohort study , 2020, BMJ.

[3]  Jessica X Hjaltelin,et al.  Disease trajectory browser for exploring temporal, population-wide disease progression patterns in 7.2 million Danish patients , 2020, Nature Communications.

[4]  P. Horby,et al.  Features of 20 133 UK patients in hospital with covid-19 using the ISARIC WHO Clinical Characterisation Protocol: prospective observational cohort study , 2020, BMJ.

[5]  E. Wilkinson RECOVERY trial: the UK covid-19 study resetting expectations for clinical trials , 2020, BMJ.

[6]  Spiros C. Denaxas,et al.  A chronological map of 308 physical and mental health conditions from 4 million individuals in the English National Health Service , 2019, The Lancet. Digital health.

[7]  Jones Kerina The SAIL Databank: 10 years of spearheading data privacy and research utility, 2007-2017 , 2017 .

[8]  K. Michaëlsson,et al.  Registers of the Swedish total population and their use in medical research , 2016, European Journal of Epidemiology.

[9]  David Moher,et al.  The REporting of Studies Conducted Using Observational Routinely-Collected Health Data (RECORD) Statement: Methods for Arriving at Consensus and Developing Reporting Guidelines , 2015, PloS one.

[10]  K. Bhaskaran,et al.  Data Resource Profile: Clinical Practice Research Datalink (CPRD) , 2015, International journal of epidemiology.

[11]  S. Bird End late registration of fact-of-death in England and Wales , 2015, The Lancet.

[12]  Communities,et al.  English Indices of Deprivation , 2013 .

[13]  A. Bourke,et al.  Generalisability of The Health Improvement Network (THIN) database: demographics, chronic disease prevalence and mortality rates. , 2011, Informatics in primary care.

[14]  Kerina H. Jones,et al.  The SAIL databank: linking multiple health and social care datasets , 2009, BMC Medical Informatics Decis. Mak..