Error adjustments for file linking methods using encrypted unique client identifier (eUCI) with application to recently released prisoners who are HIV+

Incarceration provides an opportunity to test for HIV, provide treatment such as highly active anti-retroviral therapy, as well as link infected persons to comprehensive HIV care upon their release. A key factor in assessing the success of a program that links released individuals to care is the time from release to receiving care in the community (linkage time). To estimate the linkage time, records from correction systems are linked to Ryan White Clinic data using encrypted Unique Client Identifier (eUCI). Most of the records that were linked using eUCI belong to the same individual; however, in some cases, it may link records incorrectly, or not identify records that should have been linked. We propose a Bayesian procedure that relies on the relationships between variables that appear in either of the data sources, as well as variables that exists in both to identify correctly linked records among all linked records. The procedure generates K datasets in which each pair of linked records is identified as a true link or a false link. The K datasets are analyzed independently, and the results are combined using Rubin's multiple imputation rules. A small validation dataset is used to examine different statistical models and to inform the prior distributions of the parameters. In comparison with previously proposed methods, the proposed method utilizes all of the available data and is both flexible and computationally efficient. In addition, this approach can be applied in other file linking applications.

[1]  Martyn Plummer,et al.  JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling , 2003 .

[2]  M. Hof,et al.  Methods for analyzing data from probabilistic linkage strategies based on partially identifying variables , 2012, Statistics in medicine.

[3]  D. Rubin,et al.  Inference from Iterative Simulation Using Multiple Sequences , 1992 .

[4]  Dennis Deck,et al.  Record linkage software in the public domain: a comparison of Link Plus, The Link King, and a `basic' deterministic algorithm , 2008, Health Informatics J..

[5]  D. Rubin,et al.  Small-sample degrees of freedom with multiple imputation , 1999 .

[6]  F. Altice,et al.  Effectiveness of antiretroviral therapy among HIV-infected prisoners: reincarceration and the lack of sustained benefit after release to the community. , 2004, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[7]  D. Rubin Multiple Imputation After 18+ Years , 1996 .

[8]  James O. Chipperfield,et al.  Inference Based on Estimating Equations and Probability-Linked Data , 2009 .

[9]  Thomas P Giordano,et al.  Accessing antiretroviral therapy following release from prison. , 2009, JAMA.

[10]  Wojtek J. Krzanowski,et al.  Mixtures of Continuous and Categorical Variables in Discriminant Analysis: A Hypothesis-Testing Approach , 1982 .

[11]  D. Rubin,et al.  Iterative Automated Record Linkage Using Mixture Models , 2001 .

[12]  J. Stover,et al.  Practical Considerations for Matching STD and HIV Surveillance Data with Data from other Sources , 2009, Public health reports.

[13]  H B NEWCOMBE,et al.  Automatic linkage of vital records. , 1959, Science.

[14]  John Neter,et al.  The Effect of Mismatching on the Measurement of Response Errors , 1965 .

[15]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[16]  Ingram Olkin,et al.  Multivariate Correlation Models with Mixed Discrete and Continuous Variables , 1961 .

[17]  Jerome P. Reiter,et al.  A Note on Bayesian Inference After Multiple Imputation , 2010 .

[18]  R. Little,et al.  Maximum likelihood estimation for mixed continuous and categorical data with missing values , 1985 .

[19]  W J Krzanowski,et al.  Mixtures of continuous and categorical variables in discriminant analysis. , 1980, Biometrics.

[20]  Shanti Gomatam,et al.  An empirical comparison of record linkage procedures , 2002, Statistics in medicine.

[21]  J. Schafer Multiple imputation: a primer , 1999, Statistical methods in medical research.

[22]  Ivan P. Fellegi,et al.  A Theory for Record Linkage , 1969 .

[23]  Joseph G. Ibrahim,et al.  Bayesian Survival Analysis , 2004 .

[24]  P. Lahiri,et al.  Regression Analysis With Linked Data , 2005 .

[25]  Bradley P. Carlin,et al.  Bayesian measures of model complexity and fit , 2002 .

[26]  D. Rubin,et al.  A method for calibrating false-match rates in record linkage , 1995 .

[27]  William E. Winkler,et al.  Methods for Record Linkage and Bayesian Networks , 2002 .

[28]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[29]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data: Little/Statistical Analysis with Missing Data , 2002 .

[30]  J. Rich,et al.  Successful linkage of medical care and community services for HIV-positive offenders being released from prison , 2001, Journal of Urban Health.

[31]  J. Rich,et al.  Tracking linkage to HIV care for former prisoners , 2012, Virulence.

[32]  Jerome P. Reiter,et al.  Estimating propensity scores with missing covariate data using general location mixture models. , 2011, Statistics in medicine.

[33]  Alan M Zaslavsky,et al.  A Bayesian Procedure for File Linking to Analyze End-of-Life Medical Costs , 2013, Journal of the American Statistical Association.

[34]  Harvey Goldstein,et al.  The analysis of record‐linked data using multiple imputation with data value priors , 2012, Statistics in medicine.

[35]  Becky White,et al.  Intensive Case Management Before and After Prison Release is No More Effective Than Comprehensive Pre-Release Discharge Planning in Linking HIV-Infected Prisoners to Care: A Randomized Trial , 2011, AIDS and Behavior.