Integrated methodology for multiple systems estimation and record linkage using a missing data formulation

There are now three essentially separate literatures on the topics of multiple systems estimation, record linkage, and missing data. But in practice the three are intimately intertwined. For example, record linkage involving multiple data sources for human populations is often carried out with the expressed goal of developing a merged database for multiple system estimation (MSE). Similarly, one way to view both the record linkage and MSE problems is as ones involving the estimation of missing data. This presentation highlights the technical nature of these interrelationships and provides a preliminary effort at their integration.

[1]  Ronald E. LaPorte,et al.  Capture-recapture and multiple-record systems estimation I: History and theoretical development ( Review ) , 1995 .

[2]  Kenneth H Pollock,et al.  Open Capture–Recapture Models with Heterogeneity: II. Jolly–Seber Model , 2010, Biometrics.

[3]  Peter G M van der Heijden,et al.  The multiple‐record systems estimator when registrations refer to different but overlapping populations , 2004, Statistics in medicine.

[4]  A. Agresti Simple capture-recapture models permitting unequal catchability and variable sampling effort. , 1994, Biometrics.

[5]  Stephen E. Fienberg,et al.  Discrete Multivariate Analysis: Theory and Practice , 1976 .

[6]  A. Zaslavsky,et al.  Triple-System Modeling of Census, Post-Enumeration Survey, and Administrative-List Data , 1993 .

[7]  William E. Winkler,et al.  Data quality and record linkage techniques , 2007 .

[8]  Elena Erosheva,et al.  Partial Membership Models with Application to Disability Survey Data , 2003 .

[9]  Lalitha Sanathanan,et al.  ESTIMATING THE SIZE OF A MULTINOMIAL POPULATION , 1972 .

[10]  S. Fienberg,et al.  Who Counts: The Politics of Census-Taking in Contemporary America , 1999 .

[11]  Mark Levene,et al.  Web Dynamics , 2004, Springer Berlin Heidelberg.

[12]  R. Cormack Interval estimation for mark-recapture studies of closed populations. , 1992, Biometrics.

[13]  Stephen E. Fienberg,et al.  How Large Is the World Wide Web , 2004 .

[14]  D. Rubin,et al.  Statistical Analysis with Missing Data. , 1989 .

[15]  Elena A. Erosheva,et al.  Grade of membership and latent structure models with application to disability survey data , 2002 .

[16]  Cormack Rm,et al.  Interval estimation for mark-recapture studies of closed populations. , 1992 .

[17]  D. Rubin,et al.  Statistical Analysis with Missing Data , 1988 .

[18]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[19]  Carl James Schwarz,et al.  Multilist Population Estimation with Incomplete and Partial Stratification , 2007, Biometrics.

[20]  THE ADMINISTRATIVE RECORDS EXPERIMENT IN 2000 : AN APPLICATION TO POPULATION COUNT ESTIMATION VIA TRIPLE SYSTEMS ESTIMATION , 2002 .

[21]  Pradeep Ravikumar,et al.  Adaptive Name Matching in Information Integration , 2003, IEEE Intell. Syst..

[22]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data: Little/Statistical Analysis with Missing Data , 2002 .

[23]  M. Woodbury,et al.  Mathematical typology: a grade of membership technique for obtaining disease definition. , 1978, Computers and biomedical research, an international journal.

[24]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[25]  S. Fienberg The multiple recapture census for closed populations and incomplete 2k contingency tables , 1972 .

[26]  Ronald E. LaPorte,et al.  Capture-recapture and multiple-record systems estimation II: Applications in human diseases. International Working Group for Disease Monitoring and Forecasting. , 1995, American journal of epidemiology.

[27]  Bernard A. Nadel,et al.  Representation selection for constraint satisfaction: a case study using n-queens , 1990, IEEE Expert.

[28]  D. Rubin,et al.  A method for calibrating false-match rates in record linkage , 1995 .

[29]  James Mayfield,et al.  Searching the World-Wide Web Using Signature Files , 1995 .

[30]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[31]  L. Sanathanan Models and Estimation Methods in Visual Scanning Experiments , 1972 .

[32]  Ivan P. Fellegi,et al.  A Theory for Record Linkage , 1969 .

[33]  A Chao,et al.  The applications of capture‐recapture models to epidemiological data , 2001, Statistics in medicine.

[34]  Stephen E. Fienberg,et al.  Bayesian Mixed Membership Models for Soft Clustering and Classification , 2004, GfKl.

[35]  S. Fienberg,et al.  Population Size Estimation Using Individual Level Mixture Models , 2008, Biometrical journal. Biometrische Zeitschrift.

[36]  Baker Sg A simple EM algorithm for capture-recapture data with categorical covariates. , 1990 .

[37]  Peter G M van der Heijden,et al.  Analysing capture--recapture data when some variables of heterogeneous catchability are not collected or asked in all registrations. , 2007, Statistics in medicine.

[38]  S E Fienberg,et al.  A three-sample multiple-recapture approach to census population estimation with heterogeneous catchability. , 1993, Journal of the American Statistical Association.

[39]  S. Fienberg,et al.  Classical multilevel and Bayesian approaches to population size estimation using multiple lists , 1999 .

[40]  J. Norris,et al.  NONPARAMETRIC MLE UNDER TWO CLOSED CAPTURE-RECAPTURE MODELS WITH HETEROGENEITY , 1996 .

[41]  Lalitha Sanathanan,et al.  A Comparison Of Some Models in Visual Sacanning Experiments , 1973 .