Estimation of population size based on capture recapture designs and evaluation of the estimation reliability

We propose a modern method to estimate population size based on capture-recapture designs of K samples. The observed data is formulated as a sample of n i.i.d. K-dimensional vectors of binary indicators, where the k-th component of each vector indicates the subject being caught by the k-th sample, such that only subjects with nonzero capture vectors are observed. The target quantity is the unconditional probability of the vector being nonzero across both observed and unobserved subjects. We cover models assuming a single general constraint on the K-dimensional distribution such that the target quantity is identified and the statistical model is unrestricted. We present solutions for general linear constraints, as well as constraints commonly assumed to identify capture-recapture models, including no K-way interaction in linear and log-linear models, independence or conditional independence. We demonstrate that the choice of constraint(identification assumption) has a dramatic impact on the value of the estimand, showing 1 ar X iv :2 10 5. 05 37 3v 1 [ m at h. ST ] 1 2 M ay 2 02 1 that it is crucial that the constraint is known to hold by design. For the commonly assumed constraint of no K-way interaction in a log-linear model, the statistical target parameter is only defined when each of the 2K − 1 observable capture patterns is present, and therefore suffers from the curse of dimensionality. We propose a targeted MLE based on undersmoothed lasso model to smooth across the cells while targeting the fit towards the single valued target parameter of interest. For each identification assumption, we provide simulated inference and confidence intervals to assess the performance on the estimator under correct and incorrect identifying assumptions. We apply the proposed method, alongside existing estimators, to estimate prevalence of a parasitic infection using multi-source surveillance data from a region in southwestern China, under the four identification assumptions.

[1]  Zachary T. Kurtz,et al.  Local log-linear models for capture-recapture , 2013, 1302.0890.

[2]  Edward H. Kennedy,et al.  Doubly Robust Capture-Recapture Methods for Estimating Population Size , 2021, Journal of the American Statistical Association.

[3]  M. J. van der Laan,et al.  Efficient estimation of pathwise differentiable target parameters with the undersmoothed highly adaptive lasso , 2019, The international journal of biostatistics.

[4]  David Draper,et al.  Assessment and Propagation of Model Uncertainty , 2011 .

[5]  Janet Wittes,et al.  Applications of a Multinomial Capture-Recapture Model to Epidemiological Data , 1974 .

[6]  R. Gill Non- and semi-parametric maximum likelihood estimators and the Von Mises method , 1986 .

[7]  A Chao,et al.  The applications of capture‐recapture models to epidemiological data , 2001, Statistics in medicine.

[8]  R. Cormack Log-linear models for capture-recapture , 1989 .

[9]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[10]  S. Liang,et al.  Factors influencing the transmission of Schistosoma japonicum in the mountains of Sichuan Province of China. , 2004, The American journal of tropical medicine and hygiene.

[11]  Kenneth H. Pollock,et al.  ESTIMATING DETECTION PROBABILITIES FROM MULTIPLE-OBSERVER POINT COUNTS , 2006 .

[12]  B. Gruber,et al.  Reliability of Different Mark-Recapture Methods for Population Size Estimation Tested against Reference Population Sizes Constructed from Field Data , 2014, PloS one.

[13]  M. J. Laan,et al.  Targeted Learning: Causal Inference for Observational and Experimental Data , 2011 .

[14]  R. Gâteaux,et al.  Fonctions d'une infinité de variables indépendantes , 1919 .

[15]  R R Regal,et al.  Validity of methods for model selection, weighting for model uncertainty, and small sample adjustment in capture-recapture estimation. , 1997, American journal of epidemiology.

[16]  S. Liang,et al.  Surveillance systems for neglected tropical diseases: global lessons from China’s evolving schistosomiasis reporting systems, 1949–2014 , 2014, Emerging Themes in Epidemiology.

[17]  N. Hens,et al.  Capture-Recapture Estimators in Epidemiology with Applications to Pertussis and Pneumococcal Invasive Disease Surveillance , 2016, PloS one.

[18]  Anne Chao,et al.  An overview of closed capture-recapture models , 2001 .

[19]  S. Buckland Introduction to distance sampling : estimating abundance of biological populations , 2001 .

[20]  J. L. Doob,et al.  The Limiting Distributions of Certain Statistics , 1935 .

[21]  L. Rivest,et al.  Rcapture: Loglinear Models for Capture-Recapture in R , 2007 .

[22]  H. Bozdogan Model selection and Akaike's Information Criterion (AIC): The general theory and its analytical extensions , 1987 .

[23]  G. Seber The estimation of animal abundance and related parameters , 1974 .

[24]  Zoe Emily Schnabel The Estimation of the Total Fish Population of a Lake , 1938 .

[25]  Samuel G. Rees,et al.  Testing the effectiveness of capture mark recapture population estimation techniques using a computer simulation with known population size , 2011 .