Linkage of Administrative Datasets: Enhancing Longitudinal Epidemiological Studies in the Era of “Big Data”

Abstract“Modern epidemiology” has consolidated the direct collection of individual data as the most valued approach for conducting epidemiological research. An essential feature of powerful epidemiological studies (in whatever design, observational, quasi-experimental or experimental) is a longitudinal structure, so that in the course of the study, data are collected over time and measurements can be repeated for each participant. Notably, the amount and variety of individual health data routinely collected from different sources and available in digital media have increased exponentially. This growing amount of data has caused scientific disciplines to confront essential challenges in operational (data management, infrastructure, training), methodological (new approaches to analyze and to derive inferences from “big data”), and epistemological (several argue that the hypothesis-driven science is outdated, and we live now in a data-driven era) realms. There is no doubt that the use of large administrative databases in particular when enriched through linkage with other sources of data, while in its infancy, is a powerful tool with the potential to bolster medical and epidemiological longitudinal research. Being relatively fast and low cost, it can enable the study of essential research questions previously unfeasible for among others, budgetary, or ethical reasons.

[1]  Henrik Toft Sørensen,et al.  The Danish Civil Registration System as a tool in epidemiology , 2014, European Journal of Epidemiology.

[2]  Stasha Ann Bown Larsen,et al.  Record Linkage , 2018, Encyclopedia of Database Systems.

[3]  S. Galea,et al.  Big Data and Population Health: Focusing on the Health Impacts of the Social, Physical, and Economic Environment. , 2017, Epidemiology.

[4]  S. Gibb,et al.  Constructing whole of population cohorts for health and social research using the New Zealand Integrated Data Infrastructure , 2018, Australian and New Zealand journal of public health.

[5]  John P A Ioannidis,et al.  Ethics and Epistemology in Big Data Research , 2017, Journal of Bioethical Inquiry.

[6]  Shawn Dolley,et al.  Big Data’s Role in Precision Public Health , 2018, Front. Public Health.

[7]  A. Pająk,et al.  Environmental and socio-economic determinants of infant mortality in Poland: an ecological study , 2015, Environmental Health.

[8]  Mauricio L Barreto,et al.  Effect of a conditional cash transfer programme on childhood mortality: a nationwide analysis of Brazilian municipalities , 2013, The Lancet.

[9]  Paolo Vineis,et al.  Socioeconomic status and the 25 × 25 risk factors as determinants of premature mortality: a multicohort study and meta-analysis of 1·7 million men and women , 2017, The Lancet.

[10]  Mauricio L Barreto,et al.  Impact of primary health care on mortality from heart and cerebrovascular diseases in Brazil: a nationwide analysis of longitudinal data , 2014, BMJ : British Medical Journal.

[11]  R. Somers Repeat abortion in Denmark: an analysis based on national record linkage. , 1977, Studies in family planning.

[12]  Matthias Egger,et al.  The Swiss National Cohort: a unique database for national and international researchers , 2010, International Journal of Public Health.

[13]  Spiros Denaxas,et al.  On the Accuracy and Scalability of Probabilistic Data Linkage Over the Brazilian 114 Million Cohort , 2018, IEEE Journal of Biomedical and Health Informatics.

[14]  M. Davidsen,et al.  The Danish National Cohort Study (DANCOS) , 2003, Scandinavian journal of public health.

[15]  Harvey Goldstein,et al.  Challenges in administrative data linkage for research , 2017, Big Data Soc..

[16]  Stephen J Mooney,et al.  Big Data in Public Health: Terminology, Machine Learning, and Privacy. , 2018, Annual review of public health.