Evaluation of record linkage of two large administrative databases in a middle income country: stillbirths and notifications of dengue during pregnancy in Brazil

BackgroundDue to the increasing availability of individual-level information across different electronic datasets, record linkage has become an efficient and important research tool. High quality linkage is essential for producing robust results. The objective of this study was to describe the process of preparing and linking national Brazilian datasets, and to compare the accuracy of different linkage methods for assessing the risk of stillbirth due to dengue in pregnancy.MethodsWe linked mothers and stillbirths in two routinely collected datasets from Brazil for 2009–2010: for dengue in pregnancy, notifications of infectious diseases (SINAN); for stillbirths, mortality (SIM). Since there was no unique identifier, we used probabilistic linkage based on maternal name, age and municipality. We compared two probabilistic approaches, each with two thresholds: 1) a bespoke linkage algorithm; 2) a standard linkage software widely used in Brazil (ReclinkIII), and used manual review to identify further links. Sensitivity and positive predictive value (PPV) were estimated using a subset of gold-standard data created through manual review. We examined the characteristics of false-matches and missed-matches to identify any sources of bias.ResultsFrom records of 678,999 dengue cases and 62,373 stillbirths, the gold-standard linkage identified 191 cases. The bespoke linkage algorithm with a conservative threshold produced 131 links, with sensitivity = 64.4% (68 missed-matches) and PPV = 92.5% (8 false-matches). Manual review of uncertain links identified an additional 37 links, increasing sensitivity to 83.7%. The bespoke algorithm with a relaxed threshold identified 132 true matches (sensitivity = 69.1%), but introduced 61 false-matches (PPV = 68.4%). ReclinkIII produced lower sensitivity and PPV than the bespoke linkage algorithm. Linkage error was not associated with any recorded study variables.ConclusionDespite a lack of unique identifiers for linking mothers and stillbirths, we demonstrate a high standard of linkage of large routine databases from a middle income country. Probabilistic linkage and manual review were essential for accurately identifying cases for a case-control study, but this approach may not be feasible for larger databases or for linkage of more common outcomes.

[1]  S. Cnattingius,et al.  Prenatal parental depression and preterm birth: a national cohort study , 2016, BJOG : an international journal of obstetrics and gynaecology.

[2]  Harvey Goldstein,et al.  Paediatric Intensive Care , 2013 .

[3]  Marilia Sá Carvalho,et al.  Accuracy of a probabilistic record linkage strategy applied to identify deaths among cases reported to the Brazilian AIDS surveillance database. , 2010, Cadernos de saude publica.

[4]  William E. Yancey Evaluating String Comparator Performance for Record Linkage , 2005 .

[5]  A. J. Bass,et al.  Research use of linked health data — a best practice protocol , 2002, Australian and New Zealand journal of public health.

[6]  E. Faerstein,et al.  Sensitivity of probabilistic record linkage for reported birth identification: Pró-Saúde Study. , 2008, Revista de saude publica.

[7]  D. Clark,et al.  Practical introduction to record linkage for injury research , 2004, Injury Prevention.

[8]  H. Goldstein,et al.  Evaluating bias due to data linkage error in electronic healthcare records , 2014, BMC Medical Research Methodology.

[9]  K. Harron Evaluating data linkage techniques for the analysis of bloodstream infection in paediatric intensive care , 2014 .

[10]  Ian Scott,et al.  Data Linkage: A powerful research tool with potential problems , 2010, BMC health services research.

[11]  M. Law,et al.  A New Method for Assessing How Sensitivity and Specificity of Linkage Studies Affects Estimation , 2014, PloS one.

[12]  B. Mol,et al.  797: Fetal gender of the first born and the recurrent risk of spontaneous preterm birth , 2015 .

[13]  Renata Gutierrez da Matta Coutinho,et al.  Sensibilidad del método del enlace ("linkage") probabilístico en la identificación de nacimientos informados: estudio Pro-Salud , 2008 .

[14]  Cláudia Medina Coeli,et al.  [Accuracy of the probabilistic record linkage methodology to ascertain deaths in survival studies]. , 2006, Cadernos de saude publica.

[15]  L. H. A. Salis,et al.  Cardiovascular mortality among a cohort of hypertensive and normotensives in Rio de Janeiro - Brazil - 1991–2009 , 2015, BMC Public Health.

[16]  L. Taylor,et al.  Characteristics of unmatched maternal and baby records in linked birth records and hospital discharge data. , 2006, Paediatric and perinatal epidemiology.

[17]  Bronwyn K. Clark,et al.  Excessive sitting at work and at home: Correlates of occupational sitting and TV viewing time in working adults , 2015, BMC Public Health.

[18]  D. Walsh,et al.  Regional alcohol consumption and alcohol-related mortality in Great Britain: novel insights using retail sales data , 2015, BMC Public Health.

[19]  J. Kaldor,et al.  Mortality among prisoners: how accurate is the Australian National Death Index? , 2005, Australian and New Zealand journal of public health.

[20]  M. Brownell,et al.  Administrative record linkage as a tool for public health research. , 2011, Annual review of public health.

[21]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[22]  Cláudia Medina Coeli,et al.  Reclink: aplicativo para o relacionamento de bases de dados, implementando o método probabilistic record linkage , 2000 .

[23]  Fiona Steele,et al.  Probabilistic record linkage , 2015, International journal of epidemiology.

[24]  Cláudia Medina Coeli,et al.  Accuracy of probabilistic and deterministic record linkage: the case of tuberculosis , 2016, Revista de saude publica.

[25]  J. Marc Overhage,et al.  Real World Performance of Approximate String Comparators for use in Patient Matching , 2004, MedInfo.

[26]  K. Harron,et al.  Linking Data for Mothers and Babies in De-Identified Electronic Health Data , 2016, PloS one.

[27]  B. Mol,et al.  Fetal Gender of the First Born and the Recurrent Risk of Spontaneous Preterm Birth , 2015, American Journal of Perinatology.

[28]  M. Teixeira,et al.  Symptomatic Dengue during pregnancy and the risk of stillbirth: a matched case control study using routine data in Brazil (2006-2012) , 2017 .

[29]  H B NEWCOMBE,et al.  Automatic linkage of vital records. , 1959, Science.

[30]  Laura C Rodrigues,et al.  Symptomatic dengue infection during pregnancy and the risk of stillbirth in Brazil, 2006-12: a matched case-control study. , 2017, The Lancet. Infectious diseases.