Research Paper: An Empiric Modification to the Probabilistic Record Linkage Algorithm Using Frequency-Based Weight Scaling

OBJECTIVE To incorporate value-based weight scaling into the Fellegi-Sunter (F-S) maximum likelihood linkage algorithm and evaluate the performance of the modified algorithm. Background Because healthcare data are fragmented across many healthcare systems, record linkage is a key component of fully functional health information exchanges. Probabilistic linkage methods produce more accurate, dynamic, and robust matching results than rule-based approaches, particularly when matching patient records that lack unique identifiers. Theoretically, the relative frequency of specific data elements can enhance the F-S method, including minimizing the false-positive or false-negative matches. However, to our knowledge, no frequency-based weight scaling modification to the F-S method has been implemented and specifically evaluated using real-world clinical data. METHODS The authors implemented a value-based weight scaling modification using an information theoretical model, and formally evaluated the effectiveness of this modification by linking 51,361 records from Indiana statewide newborn screening data to 80,089 HL7 registration messages from the Indiana Network for Patient Care, an operational health information exchange. In addition to applying the weight scaling modification to all fields, we examined the effect of selectively scaling common or uncommon field-specific values. RESULTS The sensitivity, specificity, and positive predictive value for applying weight scaling to all field-specific values were 95.4, 98.8, and 99.9%, respectively. Compared with nonweight scaling, the modified F-S algorithm demonstrated a 10% increase in specificity with a 3% decrease in sensitivity. CONCLUSION By eliminating false-positive matches, the value-based weight modification can enhance the specificity of the F-S method with minimal decrease in sensitivity.

[1]  Elizabeth A. Rechtsteiner,et al.  Ambulatory medical care utilization estimates for 2006. , 2008, National health statistics reports.

[2]  Jonathan S. Einbinder,et al.  Adding Value to Clinical Data By Linkage to a Public Death Registry , 2001, MedInfo.

[3]  J. Marc Overhage,et al.  In Support of Emergency Department Health Information Technology , 2005, AMIA.

[4]  Joseph L. Kannry,et al.  Research Paper: Emergency Physicians' Perceptions of Health Information Exchange , 2007, J. Am. Medical Informatics Assoc..

[5]  K. Campbell,et al.  Impact of record-linkage methodology on performance indicators and multivariate relationships. , 2009, Journal of substance abuse treatment.

[6]  Lonnie Blevins,et al.  The Indiana network for patient care: a working local health information infrastructure. An example of a working infrastructure collaboration that links data from five health systems and hundreds of millions of entries. , 2005, Health affairs.

[7]  Johannes B Reitsma,et al.  Probabilistic record linkage is a valid and transparent tool to combine databases without a patient identification number. , 2007, Journal of clinical epidemiology.

[8]  T. Blakely,et al.  Probabilistic record linkage and a method to calculate the positive predictive value. , 2002, International journal of epidemiology.

[9]  Dennis Deck,et al.  Record linkage software in the public domain: a comparison of Link Plus, The Link King, and a `basic' deterministic algorithm , 2008, Health Informatics J..

[10]  Javad Behboodian,et al.  Bmc Medical Research Methodology Open Access Sequential Boundaries Approach in Clinical Trials with Unequal Allocation Ratios , 2022 .

[11]  H Brenner,et al.  Variation of sensitivity, specificity, likelihood ratios and predictive values with disease prevalence. , 1997, Statistics in medicine.

[12]  C J McDonald,et al.  Design and implementation of the Indianapolis Network for Patient Care and Research. , 1995, Bulletin of the Medical Library Association.

[13]  William E. Winkler,et al.  Methods for Record Linkage and Bayesian Networks , 2002 .

[14]  M. Goldacre,et al.  Computerised linking of medical records: methodological guidelines. , 1993, Journal of epidemiology and community health.

[15]  R. McClure,et al.  Population health and clinical data linkage: the importance of a population registry , 2007, Australian and New Zealand journal of public health.

[16]  W. Winkler Overview of Record Linkage and Current Research Directions , 2006 .

[17]  Timothy Hoff,et al.  Long-term follow-up data collection and use in state newborn screening programs. , 2007, Archives of pediatrics & adolescent medicine.

[18]  James B Semmens,et al.  Improving the evidence base for promoting quality and equity of surgical care using population-based linkage of administrative health records. , 2005, International journal for quality in health care : journal of the International Society for Quality in Health Care.

[19]  G R Howe,et al.  Use of computerized record linkage in cohort studies. , 1998, Epidemiologic reviews.

[20]  Descriptors Census Figures,et al.  U.S. BUREAU OF THE CENSUS , 1996 .

[21]  William E. Yancey Improving EM Algorithm Estimates for Record Linkage Parameters , 2002 .

[22]  S M Cobbe,et al.  Enhancing the power of record linkage involving low quality personal identifiers: use of the best link principle and cause of death prior likelihoods. , 1998, Computers and biomedical research, an international journal.

[23]  J M Dean,et al.  Probabilistic linkage of computerized ambulance and inpatient hospital discharge records: a potential tool for evaluation of emergency medical services. , 2001, Annals of emergency medicine.

[24]  Howard B. Newcombe,et al.  Handbook of record linkage: methods for health and statistical studies, administration, and business , 1988 .

[25]  J. Marc Overhage,et al.  Real World Performance of Approximate String Comparators for use in Patient Matching , 2004, MedInfo.

[26]  J. Marc Overhage,et al.  Community Clinical Data Exchange for Emergency Medicine Patients , 2003, AMIA.

[27]  Michael D. Greenberg,et al.  Identity Crisis: An Examination of the Costs and Benefits of a Unique Patient Identifier for the U.S. Health Care System , 2008 .

[28]  Heather Clark,et al.  How good is probabilistic record linkage to reconstruct reproductive histories? Results from the Aberdeen children of the 1950s study , 2006, BMC medical research methodology.

[29]  William E. Winkler,et al.  Data quality and record linkage techniques , 2007 .

[30]  H. Mouridsen,et al.  The clinical database and the treatment guidelines of the Danish Breast Cancer Cooperative Group (DBCG); its 30-years experience and future promise , 2008, Acta oncologica.

[31]  J. Marc Overhage,et al.  Analysis of a Probabilistic Record Linkage Technique without Human Review , 2003, AMIA.

[32]  P. Ivax,et al.  A THEORY FOR RECORD LINKAGE , 2004 .

[33]  Craig A. Knoblock,et al.  Learning Blocking Schemes for Record Linkage , 2006, AAAI.

[34]  Robert H. Friis,et al.  Epidemiology for public health practice , 1996 .

[35]  J. Marc Overhage,et al.  Analysis of identifier performance using a deterministic linkage algorithm , 2002, AMIA.