Multiple system estimation using covariates having missing values and measurement error: Estimating the size of the Māori population in New Zealand

We investigate use of two or more linked registers, or lists, for both population size estimation and to investigate the relationship between variables appearing on all or only some registers. This relationship is usually not fully known because some individuals appear in only some registers, and some are not in any register. These two problems have been solved simultaneously using the EM algorithm. We extend this approach to estimate the size of the indigenous Māori population in New Zealand, leading to several innovations: (1) the approach is extended to four registers (including the population census), where the reporting of Māori status differs between registers; (2) some individuals in one or more registers have missing ethnicity, and we adapt the approach to handle this additional missingness; (3) some registers cover subsets of the population by design. We discuss under which assumptions such structural undercoverage can be ignored and provide a general result; (4) we treat the Māori indicator in each register as a variable measured with error, and embed a latent class model in the multiple system estimation to estimate the population size of a latent variable, interpreted as the true Māori status. Finally, we discuss estimating the Māori population size from administrative data only. Supplementary materials for our article are available online.

[1]  Peter G M van der Heijden,et al.  Analysing capture--recapture data when some variables of heterogeneous catchability are not collected or asked in all registrations. , 2007, Statistics in medicine.

[2]  Peter G. M. van der Heijden,et al.  People born in the Middle East but residing in the Netherlands: Invariant population size estimates and the role of active and passive covariates , 2012, 1209.6141.

[3]  Brunero Liseo,et al.  Bayesian latent class models for capture–recapture in the presence of missing data , 2020, Biometrical journal. Biometrische Zeitschrift.

[4]  Ton de Waal,et al.  Multi‐source Statistics: Basic Situations and Methods , 2020, International Statistical Review.

[5]  Marco Di Zio,et al.  Population Size Estimation Using Multiple Incomplete Lists with Overcoverage , 2018, Journal of Official Statistics.

[6]  Ray Chambers,et al.  Analysis of Integrated Data , 2019 .

[7]  S. Scholtus,et al.  Preface to Special Issue on Coverage Problems in Administrative Sources , 2015 .

[8]  J. Waldon Identification of indigenous people in Aotearoa-New Zealand-Ngā mata o taku whenua1 , 2019, Statistical Journal of the IAOS.

[9]  Ton de Waal,et al.  Quality measures for multisource statistics , 2019 .

[10]  K. Wolter Some coverage error models for census data. , 1986, Journal of the American Statistical Association.

[11]  Laura Boeschoten,et al.  Estimating Classification Errors Under Edit Restrictions in Composite Survey-Register Data Using Multiple Imputation Latent Class Modelling (MILC) , 2017 .

[12]  Peter G M van der Heijden,et al.  The multiple‐record systems estimator when registrations refer to different but overlapping populations , 2004, Statistics in medicine.

[13]  Paul P. Biemer,et al.  Approaches to the Modeling of Measurement Errors , 1990 .

[14]  V. Neuhaus,et al.  Latent Class Analysis , 2010 .

[15]  D. Raglin,et al.  Enumeration Accuracy in a Population Census: An Evaluation Using Latent Class Analysis , 2001 .

[16]  Paul A. Smith,et al.  The framework for estimating coverage in the 2011 Census of England and Wales: Combining dual-system estimation with ratio estimation , 2019, Statistical Journal of the IAOS.

[17]  Paul Smith,et al.  An Overview of Population Size Estimation where Linking Registers Results in Incomplete Covariates, with an Application to Mode of Transport of Serious Road Casualties , 2018 .

[18]  J. Hagenaars Loglinear Models with Latent Variables , 1993 .

[19]  I D Diamond,et al.  A methodological strategy for a one‐number census in the UK , 1999, Journal of the Royal Statistical Society. Series A,.

[20]  R. Madden,et al.  Indigenous identification: Past, present and a possible future , 2019, Statistical Journal of the IAOS.

[21]  A. McCutcheon,et al.  Latent Class Analysis , 2021, Encyclopedia of Autism Spectrum Disorders.

[22]  Paul H. Garthwaite,et al.  Quantifying Precision of Mark-Recapture Estimates Using the Bootstrap and Related Methods , 1991 .

[23]  Carl James Schwarz,et al.  Multilist Population Estimation with Incomplete and Partial Stratification , 2007, Biometrics.

[24]  D. Hand Statistical challenges of administrative and transaction data , 2018 .

[25]  Jeroen K. Vermunt,et al.  Estimating the number of serious road injuries per vehicle type in the Netherlands by using multiple imputation of latent classes , 2019, Journal of the Royal Statistical Society: Series A (Statistics in Society).

[26]  Peter G M van der Heijden,et al.  A Multiple‐Record Systems Estimation Method that Takes Observed and Unobserved Heterogeneity into Account , 2004, Biometrics.

[27]  Jessika Weiss,et al.  Graphical Models In Applied Multivariate Statistics , 2016 .

[28]  S. Jivraj,et al.  The stability of ethnic identity in England and Wales 2001–2011 , 2016, Journal of the Royal Statistical Society. Series A,.

[29]  J. Bunge,et al.  Capture-Recapture Methods for the Social and Medical Sciences , 2017 .