Evaluating the effect of data standardization and validation on patient matching accuracy

Objective This study evaluated the degree to which recommendations for demographic data standardization improve patient matching accuracy using real‐world datasets. Materials and Methods We used 4 manually reviewed datasets, containing a random selection of matches and nonmatches. Matching datasets included health information exchange (HIE) records, public health registry records, Social Security Death Master File records, and newborn screening records. Standardized fields including last name, telephone number, social security number, date of birth, and address. Matching performance was evaluated using 4 metrics: sensitivity, specificity, positive predictive value, and accuracy. Results Standardizing address was independently associated with improved matching sensitivities for both the public health and HIE datasets of approximately 0.6% and 4.5%. Overall accuracy was unchanged for both datasets due to reduced match specificity. We observed no similar impact for address standardization in the death master file dataset. Standardizing last name yielded improved matching sensitivity of 0.6% for the HIE dataset, while overall accuracy remained the same due to a decrease in match specificity. We noted no similar impact for other datasets. Standardizing other individual fields (telephone, date of birth, or social security number) showed no matching improvements. As standardizing address and last name improved matching sensitivity, we examined the combined effect of address and last name standardization, which showed that standardization improved sensitivity from 81.3% to 91.6% for the HIE dataset. Conclusions Data standardization can improve match rates, thus ensuring that patients and clinicians have better data on which to make decisions to enhance care quality and safety.

[1]  S. Shortell,et al.  Using health information technology to manage a patient population in accountable care organizations. , 2016, Journal of health organization and management.

[2]  Ilker Ünal,et al.  Defining an Optimal Cut-Point Value in ROC Analysis: An Alternative Approach , 2017, Comput. Math. Methods Medicine.

[3]  Amy J Barton,et al.  The emergence of a learning healthcare system. , 2013, Clinical nurse specialist CNS.

[4]  J. Marc Overhage,et al.  Real World Performance of Approximate String Comparators for use in Patient Matching , 2004, MedInfo.

[5]  Laura A. Hatfield,et al.  Early Performance of Accountable Care Organizations in Medicare. , 2016, The New England journal of medicine.

[6]  Timothy Hoff,et al.  Long-term follow-up data collection and use in state newborn screening programs. , 2007, Archives of pediatrics & adolescent medicine.

[7]  Michael D. Greenberg,et al.  Identity Crisis: An Examination of the Costs and Benefits of a Unique Patient Identifier for the U.S. Health Care System , 2008 .

[8]  Siu Hui,et al.  Evaluating latent class models with conditional dependence in record linkage , 2014, Statistics in medicine.

[9]  Shaun J. Grannis,et al.  A practical approach for incorporating dependence among fields in probabilistic record linkage , 2013, BMC Medical Informatics and Decision Making.

[10]  T. Moon The expectation-maximization algorithm , 1996, IEEE Signal Process. Mag..

[11]  J. Marc Overhage,et al.  Analysis of a Probabilistic Record Linkage Technique without Human Review , 2003, AMIA.

[12]  Vivienne J. Zhu,et al.  Research Paper: An Empiric Modification to the Probabilistic Record Linkage Algorithm Using Frequency-Based Weight Scaling , 2009, J. Am. Medical Informatics Assoc..

[13]  P. Farrell,et al.  Factors accounting for a missed diagnosis of cystic fibrosis after newborn screening , 2011, Pediatric pulmonology.

[14]  Huiping Xu,et al.  Optimal two‐phase sampling design for comparing accuracies of two binary classification rules , 2014, Statistics in medicine.

[15]  C J McDonald,et al.  Canopy computing: using the Web in clinical practice. , 1998, JAMA.

[16]  Craig A. Knoblock,et al.  Learning Blocking Schemes for Record Linkage , 2006, AAAI.

[17]  Feng Jiang,et al.  Regularized F-Measure Maximization for Feature Selection and Classification , 2009, Journal of biomedicine & biotechnology.

[18]  Peter Christen,et al.  A Survey of Indexing Techniques for Scalable Record Linkage and Deduplication , 2012, IEEE Transactions on Knowledge and Data Engineering.

[19]  S. Devore,et al.  Driving population health through accountable care organizations. , 2011, Health affairs.

[20]  W. Youden,et al.  Index for rating diagnostic tests , 1950, Cancer.

[21]  J. Marc Overhage,et al.  Analysis of identifier performance using a deterministic linkage algorithm , 2002, AMIA.

[22]  D. Blumenthal,et al.  Achieving a Nationwide Learning Health System , 2010, Science Translational Medicine.

[23]  J Marc Overhage,et al.  All health care is not local: an evaluation of the distribution of Emergency Department care delivered in Indiana. , 2011, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[24]  Shaun J. Grannis,et al.  A practical method for predicting frequent use of emergency department care using routinely available electronic registration data , 2016, BMC Emergency Medicine.