Practical introduction to record linkage for injury research

The frequency of early fatality and the transient nature of emergency medical care mean that a single database will rarely suffice for population based injury research. Linking records from multiple data sources is therefore a promising method for injury surveillance or trauma system evaluation. The purpose of this article is to review the historical development of record linkage, provide a basic mathematical foundation, discuss some practical issues, and consider some ethical concerns. Clerical or computer assisted deterministic record linkage methods may suffice for some applications, but probabilistic methods are particularly useful for larger studies. The probabilistic method attempts to simulate human reasoning by comparing each of several elements from the two records. The basic mathematical specifications are derived algebraically from fundamental concepts of probability, although the theory can be extended to include more advanced mathematics. Probabilistic, deterministic, and clerical techniques may be combined in different ways depending upon the goal of the record linkage project. If a population parameter is being estimated for a purely statistical study, a completely probabilistic approach may be most efficient; for other applications, where the purpose is to make inferences about specific individuals based upon their data contained in two or more files, the need for a high positive predictive value would favor a deterministic method or a probabilistic method with careful clerical review. Whatever techniques are used, researchers must realize that the combination of data sources entails additional ethical obligations beyond the use of each source alone.

[1]  S. Thacker HIPAA Privacy Rule and Public Health , 2003 .

[2]  A Wajda,et al.  The art and science of record linkage: methods that work with few identifiers. , 1986, Computers in biology and medicine.

[3]  H Brenner,et al.  Effects of record linkage errors on registry-based follow-up studies. , 1997, Statistics in medicine.

[4]  T. Blakely,et al.  Probabilistic record linkage and a method to calculate the positive predictive value. , 2002, International journal of epidemiology.

[5]  Matthew A. Jaro,et al.  Advances in Record-Linkage Methodology as Applied to Matching the 1985 Census of Tampa, Florida , 1989 .

[6]  M. Goldacre,et al.  Computerised linking of medical records: methodological guidelines. , 1993, Journal of epidemiology and community health.

[7]  William E. Winkler,et al.  The State of Record Linkage and Current Research Problems , 1999 .

[8]  A G Muse,et al.  Evaluating the quality of anonymous record linkage using deterministic procedures with the New York State AIDS registry and a hospital discharge file. , 1995, Statistics in medicine.

[9]  W. Deming,et al.  On the Problem of Matching Lists by Samples , 1959 .

[10]  D. Clark,et al.  Comparison of probabilistic and deterministic record linkage in the development of a statewide trauma registry. , 1995, Proceedings. Symposium on Computer Applications in Medical Care.

[11]  G R Howe,et al.  Use of computerized record linkage in cohort studies. , 1998, Epidemiologic reviews.

[12]  Katherine Clark Matchmaking , 2000, Science.

[13]  D L Rosman,et al.  Complementing police road-crash records with trauma registry data--an initial evaluation. , 2000, Accident; analysis and prevention.

[14]  J. Langley,et al.  Determining First Admissions in a Hospital Discharge File via Record Linkage , 1998, Methods of Information in Medicine.

[15]  Beth Kilss,et al.  Record Linkage Techniques - 1985. Proceedings of the Workshop on Exact Matching Methodologies , 1985 .

[16]  H. James,et al.  UNDER-REPORTING OF ROAD TRAFFIC ACCIDENTS , 1991 .

[17]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[18]  H Brenner,et al.  Determinants of Homonym and Synonym Rates of Record Linkage in Disease Registration , 1996, Methods of Information in Medicine.

[19]  J. R. Scotti,et al.  Available From , 1973 .

[20]  R M Bell,et al.  The Urge to Merge: Linking Vital Statistics Records and Medicaid Claims , 1994, Medical care.

[21]  R E Smith,et al.  The California Automated Mortality Linkage System (CAMLIS). , 1984, American journal of public health.

[22]  H B Newcombe Age-related bias in probabilistic death searches due to neglect of the "prior likelihoods". , 1995, Computers and biomedical research, an international journal.

[23]  William E. Winkler Record Linkage Software and Methods for Merging Administrative Lists , 2001 .

[24]  Scheuren Fj,et al.  Fiddling around with nonmatches and mismatches. , 1980 .

[25]  L Patterson,et al.  Combining multiple data bases for outcomes assessment. , 1996, American journal of medical quality : the official journal of the American College of Medical Quality.

[26]  A Wajda,et al.  Record linkage strategies, outpatient procedures, and administrative data. , 1996, Medical care.

[27]  L Evans The effectiveness of safety belts and helmets , 1995 .

[28]  D. Rubin,et al.  A method for calibrating false-match rates in record linkage , 1995 .

[29]  H Brenner,et al.  Effects of Record Linkage Errors on Disease Registration , 1998, Methods of Information in Medicine.

[30]  Catharyn T. Liverman,et al.  Reducing the burden of injury : advancing prevention and treatment , 1999 .

[31]  George J Annas,et al.  Medical privacy and medical research--judging the new federal regulations. , 2002, The New England journal of medicine.

[32]  R. Califf,et al.  Health Insurance Portability and Accountability Act (HIPAA): must there be a trade-off between privacy and quality of health care, or can we advance both? , 2003, Circulation.

[33]  M E Fair,et al.  Application of exact ODDS for partial agreements of names in record linkage. , 1991, Computers and biomedical research, an international journal.

[34]  P. Ivax,et al.  A THEORY FOR RECORD LINKAGE , 2004 .

[35]  H. Newcombe Strategy and art in automated death searches. , 1984, American journal of public health.

[36]  D. Clark,et al.  Decreasing mortality and morbidity rates after the institution of a statewide burn program. , 1992, The Journal of burn care & rehabilitation.

[37]  S A Waien,et al.  Linking large administrative databases: a method for conducting emergency medical services cohort studies using existing data. , 1997, Academic emergency medicine : official journal of the Society for Academic Emergency Medicine.

[38]  L J Cook,et al.  Probabilistic Record Linkage: Relationships between File Sizes, Identifiers, and Match Weights , 2001, Methods of Information in Medicine.

[39]  Anthony O'Hagan,et al.  Kendall's Advanced Theory of Statistics: Vol. 2B, Bayesian Inference. , 1996 .

[40]  D Fife Matching fatal accident reporting system cases with National Center for Health Statistics motor vehicle deaths. , 1989, Accident; analysis and prevention.

[41]  C. Conroy,et al.  Representativeness of deaths identified through the injury-at-work item on the death certificate: implications for surveillance. , 1991, American journal of public health.

[42]  J. Mosbech [Medical record linkage]. , 1967, Ugeskrift for laeger.

[43]  M Moore,et al.  Comparison of young and adult driver crashes in Alaska using linked traffic crash and hospital data. , 1997, Alaska medicine.

[44]  G. Beebe Record linkage systems--Canada vs the United States. , 1980, American Journal of Public Health.

[45]  A Wajda,et al.  Record Linkage Strategies: Part II. Portable Software and Deterministic Matching , 1991, Methods of Information in Medicine.

[46]  M W Knuiman,et al.  The construction of a road injury database. , 1993, Accident; analysis and prevention.

[47]  L J Cook,et al.  Motor vehicle crash characteristics and medical outcomes among older drivers in Utah, 1992-1995. , 2000, Annals of emergency medicine.

[48]  Consent for the linkage of data for public health research: is it (or should it be) an absolute pre‐requisite? , 2001, Australian and New Zealand journal of public health.

[49]  T A Karlson,et al.  Nonfatal motor vehicle crash injuries: Wisconsin's experience with linked data systems. , 1996, Wisconsin medical journal.

[50]  T. Schmidt Fultility-futilis-The Leaky Vessel. , 2000, Annals of emergency medicine.

[51]  M. Kendall,et al.  Kendall's advanced theory of statistics , 1995 .

[52]  Matthew A. Jaro,et al.  Probabilistic linkage of large public health data files. , 1995, Statistics in medicine.

[53]  G R Howe,et al.  A generalized iterative record linkage computer system for use in medical follow-up studies. , 1981, Computers and biomedical research, an international journal.

[54]  Katherine Kaufer Christoffel,et al.  Injury Epidemiology , 1993 .

[55]  E. Jamieson,et al.  The Feasibility and Accuracy of Anonymized Record Linkage to Estimate Shared Clientele among Three Health and Social Service Agencies , 1995, Methods of Information in Medicine.

[56]  H B Newcombe,et al.  Accuracies of Computer versus Manual Linkages of Routine Health Records , 1979, Methods of Information in Medicine.

[57]  T. Esposito,et al.  State trauma system evaluation: a unique and comprehensive approach. , 1992, Annals of emergency medicine.

[58]  D. Clark Development of a statewide trauma registry using multiple linked sources of data. , 1993, Proceedings. Symposium on Computer Applications in Medical Care.

[59]  K S Finison,et al.  Moose-motor vehicle collisions. An increasing hazard in northern New England. , 1996, Archives of surgery.

[60]  H B NEWCOMBE,et al.  Automatic linkage of vital records. , 1959, Science.

[61]  Shanti Gomatam,et al.  An empirical comparison of record linkage procedures , 2002, Statistics in medicine.

[62]  A Wajda,et al.  Record Linkage Strategies , 1991, Methods of Information in Medicine.

[63]  D. Clark,et al.  Hospital trauma registries linked with population-based data. , 1999, The Journal of trauma.

[64]  J W Runge,et al.  Linking data for injury control research. , 2000, Annals of emergency medicine.

[65]  J. Cleveland,et al.  Guidelines for infection control in dental health-care settings--2003. , 2003, MMWR. Recommendations and reports : Morbidity and mortality weekly report. Recommendations and reports.

[66]  H B Weiss,et al.  The potential of using billing data for emergency department injury surveillance. , 1997, Academic emergency medicine : official journal of the Society for Academic Emergency Medicine.

[67]  J M Dean,et al.  Probabilistic linkage of computerized ambulance and inpatient hospital discharge records: a potential tool for evaluation of emergency medical services. , 2001, Annals of emergency medicine.

[68]  W. Copes,et al.  Linking data from national trauma and rehabilitation registries. , 1996, The Journal of trauma.

[69]  A. J. Bass,et al.  Research use of linked health data — a best practice protocol , 2002, Australian and New Zealand journal of public health.

[70]  W Walop,et al.  'New data from old': epidemiology and record-linkage. , 1991, Progress in food & nutrition science.

[71]  D L Rosman,et al.  The western australian road injury database (1987-1996): ten years of linked police, hospital and death records of road crashes and injuries. , 2001, Accident; analysis and prevention.

[72]  J L Botha,et al.  Use of record linkage techniques to maintain the Leicestershire Diabetes Register. , 1994, Computer methods and programs in biomedicine.

[73]  J B Copas,et al.  Record linkage: statistical models for matching computer records. , 1990, Journal of the Royal Statistical Society. Series A,.