Inference Based on Estimating Equations and Probability-Linked Data

Ray Chambers, University of Wollongong James Chipperfield, Australian Bureau of Statistics Walter Davis, Statistics New Zealand Milorad Kovacevic, Statistics Canada Abstract Data obtained after probability linkage of administrative registers will include errors due to the fact that some linked records contain data items sourced from different individuals. Such errors can induce bias in standard statistical analyses if ignored. In this paper we describe some approaches to eliminating this bias when parametric inference is based on solution of an estimating equation, with an emphasis on linear and logistic regression analysis. Simulation results that illustrate the gains from allowing for linkage error when using probabilistically linked data to carry out these analyses are presented, as are extensions of the approach to more complex linkage situations. In particular, we explore issues that arise when sample records are linked to administrative records and also where the target of inference is the solution to the estimating equation defined by the perfectly linked data. A substantial application that illustrates the use of these ideas in identifying the major sources of error when modeling data obtained by probabilistically linking two successive Australian censuses is described.