Probabilistic linking to enhance deterministic algorithms and reduce linkage errors in hospital administrative data

Background The pseudonymisation algorithm used to link together episodes of care belonging to the same patient in England [Hospital Episode Statistics ID (HESID)] has never undergone any formal evaluation to determine the extent of data linkage error. Objective To quantify improvements in linkage accuracy from adding probabilistic linkage to existing deterministic HESID algorithms. Methods Inpatient admissions to National Health Service (NHS) hospitals in England (HES) over 17 years (1998 to 2015) for a sample of patients (born 13th or 28th of months in 1992/1998/2005/2012). We compared the existing deterministic algorithm with one that included an additional probabilistic step, in relation to a reference standard created using enhanced probabilistic matching with additional clinical and demographic information. Missed and false matches were quantified and the impact on estimates of hospital readmission within one year was determined. Results HESID produced a high missed match rate, improving over time (8.6% in 1998 to 0.4% in 2015). Missed matches were more common for ethnic minorities, those living in areas of high socio-economic deprivation, foreign patients and those with ‘no fixed abode’. Estimates of the readmission rate were biased for several patient groups owing to missed matches, which were reduced for nearly all groups. Conclusion Probabilistic linkage of HES reduced missed matches and bias in estimated readmission rates, with clear implications for commissioning, service evaluation and performance monitoring of hospitals. The existing algorithm should be modified to address data linkage error, and a retrospective update of the existing data would address existing linkage errors and their implications.

[1]  M. Law,et al.  Poor record linkage sensitivity biased outcomes in a linked cohort analysis. , 2016, Journal of clinical epidemiology.

[2]  Harvey Goldstein,et al.  Methodological Developments in Data Linkage , 2015 .

[3]  Ibrahim Abubakar,et al.  Accuracy of Probabilistic Linkage Using the Enhanced Matching System for Public Health and Epidemiological Studies , 2015, PloS one.

[4]  Yasuo Ohashi,et al.  When to conduct probabilistic linkage vs. deterministic linkage? A simulation study , 2015, J. Biomed. Informatics.

[5]  Harvey Goldstein,et al.  Identifying Possible False Matches in Anonymized Hospital Administrative Data without Patient Identifiers. , 2015, Health services research.

[6]  Harvey Goldstein,et al.  Data linkage errors in hospital administrative data when applying a pseudonymisation algorithm to paediatric intensive care records , 2015, BMJ Open.

[7]  Natalia Beloff,et al.  Characterisation of Data Quality in Electronic Healthcare Records , 2015, Health Monitoring and Personalized Feedback using Multimedia Data.

[8]  Alexander G. Hauptmann,et al.  Health Monitoring and Personalized Feedback using Multimedia Data , 2015, Springer International Publishing.

[9]  Sascha Dublin,et al.  Linking mothers and infants within electronic health records: a comparison of deterministic and probabilistic algorithms , 2015, Pharmacoepidemiology and drug safety.

[10]  R. Gilbert,et al.  Estimating the prevalence of chronic conditions in children who die in England, Scotland and Wales: a data linkage cohort study , 2014, BMJ Open.

[11]  H. Goldstein,et al.  Evaluating bias due to data linkage error in electronic healthcare records , 2014, BMC Medical Research Methodology.

[12]  K. Doran,et al.  The Revolving Hospital Door: Hospital Readmissions Among Patients Who Are Homeless , 2013, Medical care.

[13]  J. Hippisley-Cox Validity and completeness of the NHS Number in primary and secondary care: electronic data in England 1991-2013 , 2013 .

[14]  Harvey Goldstein,et al.  The analysis of record‐linked data using multiple imputation with data value priors , 2012, Statistics in medicine.

[15]  Joseph T. Lariscy,et al.  Differential Record Linkage by Hispanic Ethnicity and Age in Linked Mortality Studies , 2011, Journal of aging and health.

[16]  A. Macfarlane,et al.  Linking maternity data for England, 2005‐06: methods and data quality , 2011, Health statistics quarterly.

[17]  Ian Scott,et al.  Data Linkage: A powerful research tool with potential problems , 2010, BMC health services research.

[18]  B. Fauth Children of the 21st Century: The First Five Years , 2010 .

[19]  H. Joshi,et al.  Neighbourhoods and residential mobility , 2010 .

[20]  Daniele Pinto da Silveira,et al.  Accuracy of probabilistic record linkage applied to health databases: systematic review. , 2009, Revista de saude publica.

[21]  Daniele Pinto da Silveira,et al.  Perfeccionamiento en métodos de relacionamiento probabilístico de bases de datos en salud: revisión sistemática , 2009 .

[22]  Vivienne J. Zhu,et al.  Research Paper: An Empiric Modification to the Probabilistic Record Linkage Algorithm Using Frequency-Based Weight Scaling , 2009, J. Am. Medical Informatics Assoc..

[23]  Matthew A. Jaro,et al.  Probabilistic linkage of large public health data files. , 1995, Statistics in medicine.

[24]  J. Sterne Essentials of Medical Statistics , 1991 .