Linked electronic health records for research on a nationwide cohort of more than 54 million people in England: data resource

Abstract Objective To describe a novel England-wide electronic health record (EHR) resource enabling whole population research on covid-19 and cardiovascular disease while ensuring data security and privacy and maintaining public trust. Design Data resource comprising linked person level records from national healthcare settings for the English population, accessible within NHS Digital’s new trusted research environment. Setting EHRs from primary care, hospital episodes, death registry, covid-19 laboratory test results, and community dispensing data, with further enrichment planned from specialist intensive care, cardiovascular, and covid-19 vaccination data. Participants 54.4 million people alive on 1 January 2020 and registered with an NHS general practitioner in England. Main measures of interest Confirmed and suspected covid-19 diagnoses, exemplar cardiovascular conditions (incident stroke or transient ischaemic attack and incident myocardial infarction) and all cause mortality between 1 January and 31 October 2020. Results The linked cohort includes more than 96% of the English population. By combining person level data across national healthcare settings, data on age, sex, and ethnicity are complete for around 95% of the population. Among 53.3 million people with no previous diagnosis of stroke or transient ischaemic attack, 98 721 had a first ever incident stroke or transient ischaemic attack between 1 January and 31 October 2020, of which 30% were recorded only in primary care and 4% only in death registry records. Among 53.2 million people with no previous diagnosis of myocardial infarction, 62 966 had an incident myocardial infarction during follow-up, of which 8% were recorded only in primary care and 12% only in death registry records. A total of 959 470 people had a confirmed or suspected covid-19 diagnosis (714 162 in primary care data, 126 349 in hospital admission records, 776 503 in covid-19 laboratory test data, and 50 504 in death registry records). Although 58% of these were recorded in both primary care and covid-19 laboratory test data, 15% and 18%, respectively, were recorded in only one. Conclusions This population-wide resource shows the importance of linking person level data across health settings to maximise completeness of key characteristics and to ascertain cardiovascular events and covid-19 diagnoses. Although this resource was initially established to support research on covid-19 and cardiovascular disease to benefit clinical care and public health and to inform healthcare policy, it can broaden further to enable a wide range of research.

Spiros C. Denaxas | C. Sudlow | J. Danesh | A. Hansell | E. Birney | R. Payne | A. Banerjee | K. Khunti | J. Sterne | A. Wood | R. Kolamunnage-Dona | M. Pirmohamed | M. Gerstung | C. Wolfe | S. Bacon | B. Goldacre | M. Katsoulis | F. Greaves | G. Davies | S. Padmanabhan | M. Bennie | M. Gravenor | F. Kee | E. Di Angelantonio | G. Nicholson | D. Cromwell | P. Lorgelly | A. Douiri | M. Barber | F. Zaccardi | M. Mamas | J. Thygesen | R. Sofat | L. Pasea | H. Hemingway | Ben. Cairns | A. Docherty | W. Whiteley | Fabian Falck | J. Halcox | T. Lawrence | J. Dennis | R. Denholm | Jianhua Wu | B. Doble | M. Buch | S. Babu-Narayan | Honghan Wu | Spencer J Keene | E. Nikiphorou | C. Dale | E. Morris | A. Akbari | V. Walker | L. Wright | Susheel Varma | F. Falter | Adrian Jonas | I. Mordi | B. Mateen | R. Goldacre | A. Kurdi | M. Mizani | S. Kent | F. Torabi | T. Palmer | S. Onida | D. Harris | M. Skrypak | A. Pherwani | R. Lyons | T. Norris | Jennifer Beveridge | K. Brown | K. Kavanagh | C. Berry | J. Lyons | A. Lai | R. Griffiths | L. Pierotti | Abdul Qadr Akinoso-Imran | M. Glickman | C. Lawson | G. Curry | Craig Smith | S. Hollings | Dan O’Connell | Eloise Withnell | V. Nafilyan | T. Wilkinson | S. Salim | Carole Morris | L. North | Ken Li | M. MacLeod | J. Cooper | Ashkan Dashtban | Samantha Ip | B. Bray | B. Humberstone | R. Priedon | L. Morrice | Debbie Ringham | Brian Roberts | Huan Wang | Haoting Zhang | Nilesh J Samani | H. Abbasizanjani | Tianxiao Wang | O. Seminog | H. Wilde | Christopher Tomlinson | A. Handy | David Brind | R. Carragher | Alun H Davies | David Hughes | Deborah Lawler | Qiuju Li | Deborah Lowe | P. Machado | Sinduja Manohar | David Moreno Martos | H. Tang | M. Inouye | N. Hall | Clea du Toit | Lydia Martin | Jon Boyle | A. Shah | Massimo Caputo | Jessica Barrett | K. Cheema | N. Herz | R. Takhar | C. Rogers | J. MacArthur | L. Ellins | N. Davies | Daniel O'Connell | Jessica K. Barrett | Ben Humberstone

[1]  C. Fischbacher,et al.  Risks of and risk factors for COVID-19 disease in people with diabetes: a cohort study of the total population of Scotland , 2020, The Lancet Diabetes & Endocrinology.

[2]  C. Fischbacher,et al.  Risk of hospital admission with coronavirus disease 2019 in healthcare workers and their households: nationwide linkage cohort study , 2020, BMJ.

[3]  Jessica X Hjaltelin,et al.  Disease trajectory browser for exploring temporal, population-wide disease progression patterns in 7.2 million Danish patients , 2020, Nature Communications.

[4]  C. Sudlow,et al.  Accuracy of identifying incident stroke cases from linked health care data in UK Biobank , 2020, Neurology.

[5]  P. Horby,et al.  Features of 20 133 UK patients in hospital with covid-19 using the ISARIC WHO Clinical Characterisation Protocol: prospective observational cohort study , 2020, BMJ.

[6]  E. Wilkinson RECOVERY trial: the UK covid-19 study resetting expectations for clinical trials , 2020, BMJ.

[7]  Arturo Gonzalez-Izquierdo,et al.  UK phenomics platform for developing and validating electronic health record phenotypes: CALIBER , 2019, J. Am. Medical Informatics Assoc..

[8]  Spiros C. Denaxas,et al.  A chronological map of 308 physical and mental health conditions from 4 million individuals in the English National Health Service , 2019, The Lancet. Digital health.

[9]  C. Sudlow,et al.  Identifying dementia outcomes in UK Biobank: a validation study of primary care, hospital admissions and mortality data , 2019, European Journal of Epidemiology.

[10]  Jones Kerina The SAIL Databank: 10 years of spearheading data privacy and research utility, 2007-2017 , 2017 .

[11]  K. Michaëlsson,et al.  Registers of the Swedish total population and their use in medical research , 2016, European Journal of Epidemiology.

[12]  David Moher,et al.  The REporting of Studies Conducted Using Observational Routinely-Collected Health Data (RECORD) Statement: Methods for Arriving at Consensus and Developing Reporting Guidelines , 2015, PloS one.

[13]  K. Bhaskaran,et al.  Data Resource Profile: Clinical Practice Research Datalink (CPRD) , 2015, International journal of epidemiology.

[14]  S. Bird End late registration of fact-of-death in England and Wales , 2015, The Lancet.

[15]  P. Ziprin,et al.  Systematic review of discharge coding accuracy. , 2012, Journal of public health.

[16]  Jack E. Olson,et al.  Data Quality Assurance , 2003 .