Probabilistic record linkage and a method to calculate the positive predictive value.

BACKGROUND Computerized record linkage is commonly used in cohort studies to ascertain the study outcome, and as such its accuracy classifying the outcome can be described using the standard epidemiological terms of sensitivity and positive predictive value (PPV). METHOD We describe a 'duplicate method' to calculate the PPV of record linkage when each record can only be involved in one match (e.g. linking population files to death files). The method does not require a validation subset of records from both files with detailed personal information (e.g. name and address), and is therefore ideal for linkage projects using anonymous data. The duplicate method assumes that the number of records from one file with zero, one, two, etc., links from the other file is distributed in a manner predicted by combinatorial probabilities. Having made this assumption, the number of false positive links, and hence the PPV, are estimable. We demonstrate this duplicate method using output from anonymous and probabilistic record linkage of census and mortality records in New Zealand. RESULTS The PPV estimates conform to the pattern expected based on the underlying theory of probabilistic record linkage, and were robust to sensitivity analyses. We encourage other researchers to further assess the accuracy of this method.

[1]  H Checkoway,et al.  Bias due to misclassification in the estimation of relative risk. , 1977, American journal of epidemiology.

[2]  M. Green,et al.  Use of predictive value to adjust relative risk estimates biased by misclassification of outcome status. , 1983, American journal of epidemiology.

[3]  H B Newcombe,et al.  Reliability of computerized versus manual death searches in a study of the health of Eldorado uranium workers. , 1983, Computers in biology and medicine.

[4]  J. Kagawa,et al.  An Operational Approach to Record Linkage , 1983, Methods of Information in Medicine.

[5]  A Wajda,et al.  The art and science of record linkage: methods that work with few identifiers. , 1986, Computers in biology and medicine.

[6]  E. Acheson,et al.  Textbook of Medical Record Linkage , 1987 .

[7]  Howard B. Newcombe,et al.  Handbook of record linkage: methods for health and statistical studies, administration, and business , 1988 .

[8]  P A Van den Brandt,et al.  Development of a record linkage protocol for use in the Dutch Cancer Registry for Epidemiological Research. , 1990, International journal of epidemiology.

[9]  M. Goldacre,et al.  Computerised linking of medical records: methodological guidelines. , 1993, Journal of epidemiology and community health.

[10]  E. Calle,et al.  Utility of the National Death Index for ascertainment of mortality among cancer prevention study II participants. , 1993, American journal of epidemiology.

[11]  H Brenner,et al.  Use of the positive predictive value to correct for disease misclassification in epidemiologic studies. , 1993, American journal of epidemiology.

[12]  G. Thériault,et al.  The accuracy of ascertaining vital status in a historical cohort study of synthetic textiles workers using computerized record linkage to the Canadian Mortality Data Base. , 1993, Canadian journal of public health = Revue canadienne de sante publique.

[13]  H B Newcombe Age-related bias in probabilistic death searches due to neglect of the "prior likelihoods". , 1995, Computers and biomedical research, an international journal.

[14]  A. Rodgers,et al.  Systematic Underestimation of Treatment Effects as a Result of Diagnostic Test Inaccuracy: Implications for the Interpretation and Design of Thromboprophylaxis Trials , 1995, Thrombosis and Haemostasis.

[15]  E. Jamieson,et al.  The Feasibility and Accuracy of Anonymized Record Linkage to Estimate Shared Clientele among Three Health and Social Service Agencies , 1995, Methods of Information in Medicine.

[16]  A G Muse,et al.  Evaluating the quality of anonymous record linkage using deterministic procedures with the New York State AIDS registry and a hospital discharge file. , 1995, Statistics in medicine.

[17]  Matthew A. Jaro,et al.  Probabilistic linkage of large public health data files. , 1995, Statistics in medicine.

[18]  G R Howe,et al.  Use of computerized record linkage in cohort studies. , 1998, Epidemiologic reviews.

[19]  H Brenner,et al.  Effects of Record Linkage Errors on Disease Registration , 1998, Methods of Information in Medicine.

[20]  T. Blakely,et al.  Anonymous linkage of New Zealand mortality and Census data , 2000, Australian and New Zealand journal of public health.