论文信息 - Data Fusion: Identification Problems, Validity, and Multiple Imputation

Data Fusion: Identification Problems, Validity, and Multiple Imputation

Data fusion techniques typically aim to achieve a complete data file from different sources which do not contain the same units. Traditionally, data fusion, in the US also addressed by the term statistical matching, is done on the basis of variables common to all files. It is well known that those approaches establish conditional independence of the (specific) variables not jointly observed given the common variables, although they may be conditionally dependent in reality. However, if the common variables are (carefully) chosen in a way that already establishes conditional independence, then inference about the actually unobserved association is valid. In terms of regression analysis, this implies that the explanatory power of the common variables is high concerning the specific variables. Unfortunately, this assumption is not testable yet. Hence, we structure and discuss the objectives of statistical matching in the light of their feasibility. Four levels of validitya matching technique may achieve are introduced. By means of suitable multiple imputation (MI) techniques, the identification problem which is inherent in data fusion is reflected. In a simulation study it is also shown that MI allows to efficiently and easily use auxiliary information.

Susanne Rässler | Susanne Rässler

[1] Susanne Rässler,et al. Aspects concerning data fusion techniques , 1998 .

[2] Hans Jürgen Ohlbach,et al. Author ’ s Address , 2022 .

[3] D. Rubin,et al. Small-sample degrees of freedom with multiple imputation , 1999 .

[4] Donald B. Rubin,et al. Relating tests given to different samples , 1978 .

[5] Donald B. Rubin,et al. Characterizing the Estimation of Parameters in Incomplete-Data Problems , 1974 .

[6] Trivellore E. Raghunathan,et al. A Split Questionnaire Survey Design , 1995 .

[7] David E. Booth,et al. Analysis of Incomplete Multivariate Data , 2000, Technometrics.

[8] Nicole A. Lazar,et al. Statistical Analysis With Missing Data , 2003, Technometrics.

[9] Donald B. Rubin,et al. Statistical Matching Using File Concatenation With Adjusted Weights and Multiple Imputations , 1986 .

[10] Chris Moriarity,et al. A Note on Rubin's Statistical Matching Using File Concatenation With Adjusted Weights and Multiple Imputations , 2003 .

[11] D. Rubin. Formalizing Subjective Notions about the Effect of Nonrespondents in Sample Surveys , 1977 .