论文信息 - The analysis of record‐linked data using multiple imputation with data value priors

The analysis of record‐linked data using multiple imputation with data value priors

Probabilistic record linkage techniques assign match weights to one or more potential matches for those individual records that cannot be assigned 'unequivocal matches' across data files. Existing methods select the single record having the maximum weight provided that this weight is higher than an assigned threshold. We argue that this procedure, which ignores all information from matches with lower weights and for some individuals assigns no match, is inefficient and may also lead to biases in subsequent analysis of the linked data. We propose that a multiple imputation framework be utilised for data that belong to records that cannot be matched unequivocally. In this way, the information from all potential matches is transferred through to the analysis stage. This procedure allows for the propagation of matching uncertainty through a full modelling process that preserves the data structure. For purposes of statistical modelling, results from a simulation example suggest that a full probabilistic record linkage is unnecessary and that standard multiple imputation will provide unbiased and efficient parameter estimates.

[1] P. Lahiri,et al. Regression Analysis With Linked Data , 2005 .

[2] Ian Scott,et al. Data Linkage: A powerful research tool with potential problems , 2010, BMC health services research.

[3] D. Clark,et al. Practical introduction to record linkage for injury research , 2004, Injury Prevention.

[4] Fritz Scheuren,et al. Regression Analysis of Data Files that Are Computer Matched , 1993 .

[5] John Neter,et al. The Effect of Mismatching on the Measurement of Response Errors , 1965 .

[6] Robert Chambers,et al. Regression Analysis under Probabilistic Multi‐Linkage , 2012 .

[7] Joseph T. Lariscy,et al. Differential Record Linkage by Hispanic Ethnicity and Age in Linked Mortality Studies , 2011, Journal of aging and health.

[8] Harvey Goldstein,et al. Multilevel multivariate modelling of childhood growth, numbers of growth measurements and adult characteristics , 2009 .

[9] Harvey Goldstein,et al. Multilevel models with multivariate mixed response types , 2009 .

[10] H B Newcombe. Age-related bias in probabilistic death searches due to neglect of the "prior likelihoods". , 1995, Computers and biomedical research, an international journal.

[11] D. Rubin,et al. Multiple Imputation for Nonresponse in Surveys , 1989 .

[12] D. Rubin. Multiple imputation for nonresponse in surveys , 1989 .