Combining information from two data sources with misreporting and incompleteness to assess hospice‐use among cancer patients: a multiple imputation approach

Combining information from multiple data sources can enhance estimates of health-related measures by using one source to supply information that is lacking in another, assuming the former has accurate and complete data. However, there is little research conducted on combining methods when each source might be imperfect, for example, subject to measurement errors and/or missing data. In a multisite study of hospice-use by late-stage cancer patients, this variable was available from patients' abstracted medical records, which may be considerably underreported because of incomplete acquisition of these records. Therefore, data for Medicare-eligible patients were supplemented with their Medicare claims that contained information on hospice-use, which may also be subject to underreporting yet to a lesser degree. In addition, both sources suffered from missing data because of unit nonresponse from medical record abstraction and sample undercoverage for Medicare claims. We treat the true hospice-use status from these patients as a latent variable and propose to multiply impute it using information from both data sources, borrowing the strength from each. We characterize the complete-data model as a product of an 'outcome' model for the probability of hospice-use and a 'reporting' model for the probability of underreporting from both sources, adjusting for other covariates. Assuming the reports of hospice-use from both sources are missing at random and the underreporting are conditionally independent, we develop a Bayesian multiple imputation algorithm and conduct multiple imputation analyses of patient hospice-use in demographic and clinical subgroups. The proposed approach yields more sensible results than alternative methods in our example. Our model is also related to dual system estimation in population censuses and dual exposure assessment in epidemiology.

[1]  Sander Greenland,et al.  Multiple-imputation for measurement-error correction. , 2006, International journal of epidemiology.

[2]  Robert J. Mislevy,et al.  Randomization-based inference about latent variables from complex samples , 1991 .

[3]  C. Earle,et al.  End-of-Life Care Discussions Among Patients With Advanced Cancer , 2012, Annals of Internal Medicine.

[4]  C D Drews,et al.  Use of Two Data Sources to Estimate Odds Ratios in Case‐Control Studies , 1993, Epidemiology.

[5]  W. Wong,et al.  The calculation of posterior distributions by data augmentation , 1987 .

[6]  K. Kahn,et al.  Discussions with physicians about hospice among patients with metastatic lung cancer. , 2009, Archives of internal medicine.

[7]  Nathaniel Schenker,et al.  From single‐race reporting to multiple‐race reporting: using imputation methods to bridge the transition , 2003, Statistics in medicine.

[8]  S. Chib,et al.  Bayesian analysis of binary and polychotomous response data , 1993 .

[9]  Yulei He,et al.  Combining Information from Cancer Registry and Medical Records Data to Improve Analyses of Adjuvant Cancer Therapies , 2009, Biometrics.

[10]  A. Gelman Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper) , 2004 .

[11]  D. Rubin,et al.  Inference from Iterative Simulation Using Multiple Sequences , 1992 .

[12]  Nathaniel Schenker,et al.  Combining information from multiple surveys to enhance estimation of measures of health , 2007, Statistics in medicine.

[13]  K. Kahn,et al.  Understanding cancer treatment and outcomes: the Cancer Care Outcomes Research and Surveillance Consortium. , 2004, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[14]  A. Zaslavsky,et al.  Triple-System Modeling of Census, Post-Enumeration Survey, and Administrative-List Data , 1993 .

[15]  Jerome P. Reiter,et al.  The Multiple Adaptations of Multiple Imputation , 2007 .

[16]  Recail M Yucel,et al.  Imputation of Binary Treatment Variables With Measurement Error in Administrative Data , 2005 .

[17]  J. Alho,et al.  Estimating heterogeneity in the probabilities of enumeration for dual-system estimation. , 1993, Journal of the American Statistical Association.

[18]  C. Earle,et al.  Aggressiveness of cancer care near the end of life: is it a quality-of-care issue? , 2008, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[19]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data: Little/Statistical Analysis with Missing Data , 2002 .

[20]  A. Albert,et al.  On the existence of maximum likelihood estimates in logistic regression models , 1984 .

[21]  S. Chib,et al.  Analysis of multivariate probit models , 1998 .

[22]  J. Neuhaus Bias and efficiency loss due to misclassified responses in binary regression , 1999 .