New methods for small area estimation with linkage uncertainty

In Official Statistics, interest for data integration has been increasingly growing, due to the need of extracting information from different sources. However, the effects of these procedures on the validity of the resulting statistical analyses has been disregarded for a long time. In recent years, it has been largely recognized that linkage is not an error-free procedure and linkage errors, as false links and/or missed links, can invalidate the reliability of estimates in standard statistical models. In this paper we consider the general problem of making inference using data that have been probabilistically linked and we explore the effect of potential linkage errors on the production of small area estimates. We describe the existing methods and propose and compare new approaches both from a classical and from a Bayesian perspective. We perform a simulation study to assess pros and cons of each proposed method; our simulation scheme aims at reproducing a realistic context both for small area estimation and record linkage procedures.

[1]  D. Rubin,et al.  Iterative Automated Record Linkage Using Mixture Models , 2001 .

[2]  D. Harville Maximum Likelihood Approaches to Variance Component Estimation and to Related Problems , 1977 .

[3]  J. Rao,et al.  The estimation of the mean squared error of small-area estimators , 1990 .

[5]  Gunky Kim,et al.  Regression analysis under incomplete linkage , 2012, Comput. Stat. Data Anal..

[6]  A. Gelman Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper) , 2004 .

[7]  Matthew A. Jaro,et al.  Advances in Record-Linkage Methodology as Applied to Matching the 1985 Census of Tampa, Florida , 1989 .

[8]  P. Lahiri,et al.  Regression Analysis With Linked Data , 2005 .

[9]  D. Rubin,et al.  A method for calibrating false-match rates in record linkage , 1995 .

[10]  Klairung Samart,et al.  Linear regression with nested errors using probability-linked data , 2014 .

[11]  William E. Winkler,et al.  Matching and record linkage , 2011 .

[12]  Brunero Liseo,et al.  A hierarchical Bayesian approach to record linkage and population size problems , 2010, 1011.2649.

[13]  Brunero Liseo,et al.  Bayesian estimation of population size via linkage of multivariate normal data sets , 2011 .

[14]  John Neter,et al.  The Effect of Mismatching on the Measurement of Response Errors , 1965 .

[15]  J B Copas,et al.  Record linkage: statistical models for matching computer records. , 1990, Journal of the Royal Statistical Society. Series A,.

[16]  Ivan P. Fellegi,et al.  A Theory for Record Linkage , 1969 .

[17]  Brunero Liseo,et al.  Regression analysis with linked data: problems and possible solutions , 2015 .

[18]  Tiziana Tuoto,et al.  Coverage Evaluation on Probabilistically Linked Data , 2015 .

[19]  Ray Chambers,et al.  Regression Analysis of Probability-Linked Data , 2009 .

[20]  Rachel M. Harter,et al.  An Error-Components Model for Prediction of County Crop Areas Using Survey and Satellite Data , 1988 .

[21]  Tiziana Tuoto,et al.  When adjusting for the bias due to linkage errors: A sensitivity analysis , 2018, Statistical Journal of the IAOS.

[22]  Fritz Scheuren,et al.  Regression Analysis of Data Files that Are Computer Matched , 1993 .

[23]  Mauricio Sadinle,et al.  Bayesian Estimation of Bipartite Matchings for Record Linkage , 2016, 1601.06630.

[24]  J. Rao Small Area Estimation , 2003 .