A Bayesian Procedure for File Linking to Analyze End-of-Life Medical Costs

End-of-life medical expenses are a significant proportion of all health care expenditures. These costs were studied using costs of services from Medicare claims and cause of death (CoD) from death certificates. In the absence of a unique identifier linking the two datasets, common variables identified unique matches for only 33% of deaths. The remaining cases formed cells with multiple cases (32% in cells with an equal number of cases from each file and 35% in cells with an unequal number). We sampled from the joint posterior distribution of model parameters and the permutations that link cases from the two files within each cell. The linking models included the regression of location of death on CoD and other parameters, and the regression of cost measures with a monotone missing data pattern on CoD and other demographic characteristics. Permutations were sampled by enumerating the exact distribution for small cells and by the Metropolis algorithm for large cells. Sparse matrix data structures enabled efficient calculations despite the large dataset (≈1.7 million cases). The procedure generates m datasets in which the matches between the two files are imputed. The m datasets can be analyzed independently and results can be combined using Rubin’s multiple imputation rules. Our approach can be applied in other file-linking applications. Supplementary materials for this article are available online.

[1]  J. Stoer,et al.  Introduction to Numerical Analysis , 2002 .

[2]  W. Wong,et al.  The calculation of posterior distributions by data augmentation , 1987 .

[3]  Lifang Gu,et al.  Record Linkage: Current Practice and Future Directions , 2003 .

[4]  D. Rubin,et al.  Inference from Iterative Simulation Using Multiple Sequences , 1992 .

[5]  Willard L. Rodgers,et al.  An Evaluation of Statistical Matching , 1984 .

[6]  S. Felder,et al.  Health care expenditure in the last months of life. , 2000, Journal of health economics.

[7]  D. Rubin,et al.  Iterative Automated Record Linkage Using Mixture Models , 2001 .

[8]  Roderick J A Little,et al.  A Review of Hot Deck Imputation for Survey Non‐response , 2010, International statistical review = Revue internationale de statistique.

[9]  Marcello D'Orazio,et al.  Statistical Matching: Theory and Practice , 2006 .

[10]  C. Holmes,et al.  Bayesian auxiliary variable models for binary and multinomial regression , 2006 .

[11]  D. Rubin,et al.  A method for calibrating false-match rates in record linkage , 1995 .

[12]  Prem K. Goel,et al.  Estimation of the Correlation Coefficient from a Broken Random Sample , 1980 .

[13]  Michael D. Larsen Record Linkage Using Finite Mixture Models , 2005 .

[14]  Fritz Scheuren,et al.  Regression Analysis of Data Files that Are Computer Matched , 1993 .

[15]  Donald B. Rubin,et al.  Statistical Matching Using File Concatenation With Adjusted Weights and Multiple Imputations , 1986 .

[16]  P. Ivax,et al.  A THEORY FOR RECORD LINKAGE , 2004 .

[17]  Chris Moriarity,et al.  A Note on Rubin's Statistical Matching Using File Concatenation With Adjusted Weights and Multiple Imputations , 2003 .

[18]  Howard B. Newcombe,et al.  Record linkage: making maximum use of the discriminating power of identifying information , 1962, CACM.

[19]  P. Lahiri,et al.  Regression Analysis With Linked Data , 2005 .

[20]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data: Little/Statistical Analysis with Missing Data , 2002 .

[21]  Adrian Raftery,et al.  The Number of Iterations, Convergence Diagnostics and Generic Metropolis Algorithms , 1995 .

[22]  W. Winkler USING THE EM ALGORITHM FOR WEIGHT COMPUTATION IN THE FELLEGI-SUNTER MODEL OF RECORD LINKAGE , 2000 .

[23]  Donald B. Rubin,et al.  Characterizing the Estimation of Parameters in Incomplete-Data Problems , 1974 .

[24]  D. Rubin,et al.  Multiple Imputation for Nonresponse in Surveys , 1989 .

[25]  A. Brix Bayesian Data Analysis, 2nd edn , 2005 .

[26]  J. Lubitz,et al.  Trends in Medicare payments in the last year of life. , 1993, The New England journal of medicine.

[27]  Matthew A. Jaro,et al.  Advances in Record-Linkage Methodology as Applied to Matching the 1985 Census of Tampa, Florida , 1989 .

[28]  W. Manning,et al.  Estimating Log Models: To Transform or Not to Transform? , 1999, Journal of health economics.

[29]  Gerry Leversha,et al.  Introduction to numerical analysis (3rd edn), by J. Stoer and R. Bulirsch. Pp. 744. £49. 2002. ISBN 0 387 95452 X (Springer-Verlag). , 2004, The Mathematical Gazette.

[30]  Susanne Rässler,et al.  Statistical Matching: "A Frequentist Theory, Practical Applications, And Alternative Bayesian Approaches" , 2002 .

[31]  Brunero Liseo,et al.  A hierarchical Bayesian approach to record linkage and population size problems , 2010, 1011.2649.

[32]  B. Liseo,et al.  On Bayesian Record Linkage , 2000 .

[33]  Andrew Gelman,et al.  Applied Bayesian Modeling And Causal Inference From Incomplete-Data Perspectives , 2005 .

[34]  Joseph Hilbe,et al.  Data Analysis Using Regression and Multilevel/Hierarchical Models , 2009 .

[35]  J. Lynn,et al.  Medicare beneficiaries' costs of care in the last year of life. , 2001, Health affairs.

[36]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.