Data Fusion: Identification Problems, Validity, and Multiple Imputation

Data fusion techniques typically aim to achieve a complete data file from different sources which do not contain the same units. Traditionally, data fusion, in the US also addressed by the term statistical matching, is done on the basis of variables common to all files. It is well known that those approaches establish conditional independence of the (specific) variables not jointly observed given the common variables, although they may be conditionally dependent in reality. However, if the common variables are (carefully) chosen in a way that already establishes conditional independence, then inference about the actually unobserved association is valid. In terms of regression analysis, this implies that the explanatory power of the common variables is high concerning the specific variables. Unfortunately, this assumption is not testable yet. Hence, we structure and discuss the objectives of statistical matching in the light of their feasibility. Four levels of validitya matching technique may achieve are introduced. By means of suitable multiple imputation (MI) techniques, the identification problem which is inherent in data fusion is reflected. In a simulation study it is also shown that MI allows to efficiently and easily use auxiliary information.

[1]  Susanne Rässler,et al.  Aspects concerning data fusion techniques , 1998 .

[2]  Hans Jürgen Ohlbach,et al.  Author ’ s Address , 2022 .

[3]  D. Rubin,et al.  Small-sample degrees of freedom with multiple imputation , 1999 .

[4]  Donald B. Rubin,et al.  Relating tests given to different samples , 1978 .

[5]  Donald B. Rubin,et al.  Characterizing the Estimation of Parameters in Incomplete-Data Problems , 1974 .

[6]  Trivellore E. Raghunathan,et al.  A Split Questionnaire Survey Design , 1995 .

[7]  David E. Booth,et al.  Analysis of Incomplete Multivariate Data , 2000, Technometrics.

[8]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[9]  Donald B. Rubin,et al.  Statistical Matching Using File Concatenation With Adjusted Weights and Multiple Imputations , 1986 .

[10]  Chris Moriarity,et al.  A Note on Rubin's Statistical Matching Using File Concatenation With Adjusted Weights and Multiple Imputations , 2003 .

[11]  D. Rubin Formalizing Subjective Notions about the Effect of Nonrespondents in Sample Surveys , 1977 .

[12]  Joseph B. Kadane Some Statistical Problems in Merging Data Files , 2001 .

[13]  C. Moriarity,et al.  Statistical Matching: A Paradigm for Assessing the Uncertainty in the Procedure , 2001 .

[14]  D. Rubin Using Propensity Scores to Help Design Observational Studies: Application to the Tobacco Litigation , 2001, Health Services and Outcomes Research Methodology.

[15]  Willard L. Rodgers,et al.  An Evaluation of Statistical Matching , 1984 .

[16]  Amar Gupta,et al.  Data Fusion Through Statistical Matching , 2015 .

[17]  Marco Di Zio,et al.  Statistical Matching and the Likelihood Principle: Uncertainty and Logical Constraints , 2003 .

[18]  Susanne Rässler,et al.  Statistical Matching: "A Frequentist Theory, Practical Applications, And Alternative Bayesian Approaches" , 2002 .

[19]  Susanne Rässler,et al.  A split questionnaire survey design applied to German media and consumer surveys , 2002 .

[20]  G. C. Tiao,et al.  Bayesian inference in statistical analysis , 1973 .