A New Method for Assessing How Sensitivity and Specificity of Linkage Studies Affects Estimation

Background While the importance of record linkage is widely recognised, few studies have attempted to quantify how linkage errors may have impacted on their own findings and outcomes. Even where authors of linkage studies have attempted to estimate sensitivity and specificity based on subjects with known status, the effects of false negatives and positives on event rates and estimates of effect are not often described. Methods We present quantification of the effect of sensitivity and specificity of the linkage process on event rates and incidence, as well as the resultant effect on relative risks. Formulae to estimate the true number of events and estimated relative risk adjusted for given linkage sensitivity and specificity are then derived and applied to data from a prisoner mortality study. The implications of false positive and false negative matches are also discussed. Discussion Comparisons of the effect of sensitivity and specificity on incidence and relative risks indicate that it is more important for linkages to be highly specific than sensitive, particularly if true incidence rates are low. We would recommend that, where possible, some quantitative estimates of the sensitivity and specificity of the linkage process be performed, allowing the effect of these quantities on observed results to be assessed.

[1]  B. Armstrong,et al.  Cervical cancer screening in Middle Eastern and Asian migrants to Australia: a record linkage study. , 2012, Cancer epidemiology.

[2]  Arie Hasman,et al.  Results from simulated data sets: probabilistic record linkage outperforms deterministic record linkage. , 2011, Journal of clinical epidemiology.

[3]  B. Sternfeld,et al.  Statin Use and Risk of Prostate Cancer in the California Men's Health Study Cohort , 2007, Cancer Epidemiology Biomarkers & Prevention.

[4]  K. Czene,et al.  Familial concordance in cancer survival: a Swedish population-based study. , 2007, The Lancet. Oncology.

[5]  P. Lahiri,et al.  Regression Analysis With Linked Data , 2005 .

[6]  Lifang Gu,et al.  Record Linkage: Current Practice and Future Directions , 2003 .

[7]  L. Vatten,et al.  Prepregnancy cardiovascular risk factors as predictors of pre-eclampsia: population based cohort study , 2007, BMJ : British Medical Journal.

[8]  Norman E. Breslow,et al.  The design and analysis of cohort studies , 1987 .

[9]  Qiang Xia,et al.  Matching AIDS and tuberculosis registry data to identify AIDS/tuberculosis comorbidity cases in California , 2011, Health Informatics J..

[10]  J. Kaldor,et al.  Mortality among prisoners: how accurate is the Australian National Death Index? , 2005, Australian and New Zealand journal of public health.

[11]  John Neter,et al.  The Effect of Mismatching on the Measurement of Response Errors , 1965 .

[12]  Fritz Scheuren,et al.  Regression Analysis of Data Files that Are Computer Matched , 1993 .

[13]  F. Stanley,et al.  Patterns, trends, and increasing disparities in mortality for Aboriginal and non-Aboriginal infants born in Western Australia, 1980–2001: population database study , 2006, The Lancet.

[14]  A. Tonkin,et al.  Accuracy of the Australian National Death Index: comparison with adjudicated fatal outcomes among Australian participants in the Long‐term Intervention with Pravastatin in Ischaemic Disease (LIPID) study , 2003, Australian and New Zealand journal of public health.

[15]  J. Kaldor,et al.  Cancer incidence before and after kidney transplantation. , 2006, JAMA.

[16]  Matthew A. Jaro,et al.  Probabilistic linkage of large public health data files. , 1995, Statistics in medicine.

[17]  J. Kaldor,et al.  Cancer incidence in people with hepatitis B or C infection: a large community-based linkage study. , 2006, Journal of hepatology.

[18]  N. E. Breslow Statistical Methods in Cancer Research , 1986 .

[19]  J. Kaldor,et al.  Factors associated with mortality in a cohort of Australian prisoners , 2007, European Journal of Epidemiology.

[20]  R. Goldbohm,et al.  Body mass index, height and risk of adenocarcinoma of the oesophagus and gastric cardia: a prospective cohort study , 2007, Gut.

[21]  Ian Scott,et al.  Data Linkage: A powerful research tool with potential problems , 2010, BMC health services research.

[22]  J. Kaldor,et al.  Causes of death after diagnosis of hepatitis B or hepatitis C infection: a large community-based linkage study , 2006, The Lancet.

[23]  R. Goldbohm,et al.  Dietary folate and folate vitamers and the risk of prostate cancer in The Netherlands Cohort Study , 2012, Cancer Causes & Control.

[24]  A. Dobson,et al.  Effectiveness of the National Death Index for establishing the vital status of older women in the Australian Longitudinal Study on Women's Health , 2000, Australian and New Zealand journal of public health.

[25]  H T Sorensen,et al.  A framework for evaluation of secondary data sources for epidemiological research. , 1996, International journal of epidemiology.