Statistical Matching Analysis for Complex Survey Data With Applications

ABSTRACT The goal of statistical matching is the estimation of a joint distribution having observed only samples from its marginals. The lack of joint observations on the variables of interest is the reason of uncertainty about the joint population distribution function. In the present article, the notion of matching error is introduced, and upper-bounded via an appropriate measure of uncertainty. Then, an estimate of the distribution function for the variables not jointly observed is constructed on the basis of a modification of the conditional independence assumption in the presence of logical constraints. The corresponding measure of uncertainty is estimated via sample data. Finally, a simulation study is performed, and an application to a real case is provided. Supplementary materials for this article are available online.

[1]  Marcello D'Orazio,et al.  Statistical Matching: Theory and Practice , 2006 .

[2]  Pier Luigi Conti,et al.  How far from identifiability? A systematic overview of the statistical matching problem in a non parametric framework , 2017 .

[3]  C. Manski Partial Identification of Probability Distributions , 2003 .

[4]  Friedrich Pukelsheim,et al.  Biproportional scaling of matrices and the iterative proportional fitting procedure , 2014, Ann. Oper. Res..

[5]  Jerome P. Reiter,et al.  Using Multiple Imputation to Integrate and Disseminate Confidential Microdata , 2009 .

[6]  Statistical matching : a model based approach for data integration , 2013 .

[7]  Y. Berger Asymptotic consistency under large entropy sampling designs with unequal probabilities , 2011 .

[8]  Joseph B. Kadane Some Statistical Problems in Merging Data Files , 2001 .

[9]  J. Hájek Asymptotic Theory of Rejective Sampling with Varying Probabilities from a Finite Population , 1964 .

[10]  Yves Tillé,et al.  Sampling Algorithms , 2011, International Encyclopedia of Statistical Science.

[11]  G. Dall’aglio,et al.  Frechet Classes: The Beginnings , 1991 .

[12]  Jerome P. Reiter BAYESIAN FINITE POPULATION IMPUTATION FOR DATA FUSION , 2012 .

[13]  Pier Luigi Conti,et al.  Uncertainty Analysis in Statistical Matching , 2012 .

[14]  D. Pfeffermann The Role of Sampling Weights when Modeling Survey Data , 1993 .

[15]  Changbao Wu,et al.  Combining Information from Multiple Surveys through the Empirical Likelihood Method , 2022 .

[16]  Yves Tillé,et al.  A Direct Bootstrap Method for Complex Sampling Designs From a Finite Population , 2011 .

[17]  S. Fienberg Bayesian Models and Methods in Public Policy and Government Settings , 2011, 1108.2177.

[18]  Yves G. Berger,et al.  Rate of convergence to normal distribution for the Horvitz-Thompson estimator , 1998 .

[19]  Pier Luigi Conti,et al.  Uncertainty analysis for statistical matching of ordered categorical variables , 2013, Comput. Stat. Data Anal..

[20]  Pier Luigi Conti,et al.  On the Estimation of the Distribution Function of a Finite Population Under High Entropy Sampling Designs, with Applications , 2014, Sankhya B.

[21]  Tom Breur,et al.  Data analysis across various media: Data fusion, direct marketing, clickstream data and social media , 2011 .

[22]  Benjamin Okner,et al.  Constructing a New Data Base from Existing Microdata Sets: The 1966 Merge File , 1972 .

[23]  P. Conti,et al.  Inference for Quantiles of a Finite Population: Asymptotic versus Resampling Results , 2015 .

[24]  Donald B. Rubin,et al.  Statistical Matching Using File Concatenation With Adjusted Weights and Multiple Imputations , 1986 .