A random‐censoring Poisson model for underreported data

A major challenge when monitoring risks in socially deprived areas of under developed countries is that economic, epidemiological, and social data are typically underreported. Thus, statistical models that do not take the data quality into account will produce biased estimates. To deal with this problem, counts in suspected regions are usually approached as censored information. The censored Poisson model can be considered, but all censored regions must be precisely known a priori, which is not a reasonable assumption in most practical situations. We introduce the random-censoring Poisson model (RCPM) which accounts for the uncertainty about both the count and the data reporting processes. Consequently, for each region, we will be able to estimate the relative risk for the event of interest as well as the censoring probability. To facilitate the posterior sampling process, we propose a Markov chain Monte Carlo scheme based on the data augmentation technique. We run a simulation study comparing the proposed RCPM with 2 competitive models. Different scenarios are considered. RCPM and censored Poisson model are applied to account for potential underreporting of early neonatal mortality counts in regions of Minas Gerais State, Brazil, where data quality is known to be poor.

[1]  W. Wong,et al.  The calculation of posterior distributions by data augmentation , 1987 .

[2]  Yoonjoung Choi,et al.  Unconventional approaches to mortality estimation , 2005 .

[3]  H A Gans,et al.  [The demography of tropical Africa]. , 1976, Geografisch tijdschrift.

[4]  Rainer Winkelmann,et al.  Markov chain Monte Carlo analysis of underreported count data with an application to worker absenteeism , 1996 .

[5]  Elías Moreno,et al.  Estimating with incomplete count data A Bayesian approach , 1998 .

[6]  J. Besag,et al.  Bayesian image restoration, with two applications in spatial statistics , 1991 .

[7]  Y. Samant,et al.  Work‐related skin diseases in Norway may be underreported: data from 2000 to 2013 , 2015, Contact dermatitis.

[8]  R. Assunção,et al.  A Bayesian space varying parameter model applied to estimating fertility schedules , 2002, Statistics in medicine.

[9]  S. Chib Bayes inference in the Tobit censored regression model , 1992 .

[10]  W. Brass,et al.  Demographic Data Analysis in Less Developed Countries: 1946-1996 , 1996 .

[11]  Joseph V. Terza,et al.  A Tobit-type estimator for the censored Poisson regression model , 1985 .

[12]  S. Caudill,et al.  Modeling household fertility decisions: Estimation and testing of censored regression models for count data , 1995, Empirical economics.

[13]  Zhengyan Zhao,et al.  Infant mortality and life expectancy in China , 2014, Medical science monitor : international medical journal of experimental and clinical research.

[14]  L Bernardinelli,et al.  Bayesian estimates of disease maps: how important are priors? , 1995, Statistics in medicine.

[15]  D. Clayton,et al.  Empirical Bayes estimates of age-standardized relative risks for use in disease mapping. , 1987, Biometrics.

[16]  Francisco Cribari-Neto,et al.  An Introduction to Bartlett Correction and Bias Reduction , 2014 .

[17]  S. Hills,et al.  Estimated global incidence of Japanese encephalitis: a systematic review. , 2011, Bulletin of the World Health Organization.

[18]  J. Gould,et al.  Incomplete birth certificates: a risk marker for infant mortality. , 2002, American journal of public health.

[19]  Helga Wagner,et al.  Sparse Bayesian modelling of underreported count data , 2016 .

[20]  James D. Stamey,et al.  Bayesian variable selection for Poisson regression with underreported responses , 2010, Comput. Stat. Data Anal..

[21]  G. Finch,et al.  Measurement of infant mortality in less developed countries. , 1978 .

[22]  E. França,et al.  Mortalidade neonatal precoce hospitalar em Minas Gerais: associação com variáveis assistenciais e a questão da subnotificação , 2007 .

[23]  Renato M. Assunção,et al.  Empirical bayes estimation of demographic schedules for small areas , 2005, Demography.

[24]  A S Whittemore,et al.  Poisson regression with misclassified counts: application to cervical cancer. , 1991, Journal of the Royal Statistical Society. Series C, Applied statistics.

[25]  T. Bailey,et al.  Modeling of under-detection of cases in disease surveillance. , 2005, Annals of epidemiology.

[26]  J. Besag Spatial Interaction and the Statistical Analysis of Lattice Systems , 1974 .

[27]  B. Kumar,et al.  Infant and under-five mortality in Afghanistan: current estimates and limitations. , 2010, Bulletin of the World Health Organization.

[28]  M. Plummer,et al.  CODA: convergence diagnosis and output analysis for MCMC , 2006 .