An examination of the generalised pooled binomial distribution and its information properties

BEN O’NEILL* AND ANGUS MCLURE**, Australian National University*** WRITTEN 8 DECEMBER 2020 Abstract This paper examines the statistical properties of a distributional form that arises from pooled testing for the prevalence of a binary outcome. Our base distribution is a two-parameter distribution using a prevalence and excess intensity parameter; the latter is included to allow for a dilution or intensification effect with larger pools. We also examine a generalised form of the distribution where pools have covariate information that affects the prevalence through a linked linear form. We study the general pooled binomial distribution in its own right and as a special case of broader forms of binomial GLMs using the complementary log-log link function. We examine the information function and show the information content of individual sample items. We demonstrate that pooling reduces information content of sample units and we give simple heuristics for choosing an “optimal” pool size for testing. We derive the form of the log-likelihood function and its derivatives and give results for maximum likelihood estimation. We also discuss diagnostic testing of the positive pool probabilities, including testing for intensification/dilution in the testing mechanism. We illustrate the use of this distribution by applying it to pooled testing data on virus prevalence in a mosquito population. POOLED TESTING; GROUP TESTING; POOLED BINOMIAL DISTRIBUTION; UNIT INFORMATION; BINOMIAL GLM; COMPLEMENTARY LOG-LOG LINK FUNCTION.

[1]  C. Le A new estimator for infection rates using pools of variable size. , 1981, American journal of epidemiology.

[2]  W. Swallow,et al.  Using group testing to estimate a proportion, and to test the binomial model. , 1990, Biometrics.

[3]  A. Sterrett On the Detection of Defective Members of Large Populations , 1957 .

[4]  Marcello Pagano,et al.  On the informativeness and accuracy of pooled testing in estimating prevalence of a rare disease: Application to HIV screening , 1995 .

[5]  J. Gower,et al.  The use of a multiple-transfer method in plant virus transmission studies- some statistical points arising in the analysis of results , 1960 .

[6]  C. Theobald,et al.  Group Testing, the Pooled Hypergeometric Distribution, and Estimating the Number of Defectives in Small Populations , 2014 .

[7]  F. K. Hwang,et al.  Group testing with a dilution effect , 1976 .

[8]  C. Farrington Estimating prevalence by group testing using generalized linear models. , 1992, Statistics in medicine.

[9]  S. Stanley Young,et al.  Statistical Design of Pools Using Optimal Coverage and Minimal Collision , 2006, Technometrics.

[10]  Joshua M Tebbs,et al.  From mixed effects modeling to spike and slab variable selection: A Bayesian regression model for group testing data , 2019, Biometrics.

[11]  J. Tebbs,et al.  Group testing regression model estimation when case identification is a goal , 2013, Biometrical journal. Biometrische Zeitschrift.

[12]  S D Walter,et al.  Estimation of infection rates in population of organisms using pools of variable size. , 1980, American journal of epidemiology.

[13]  R. Watson,et al.  Exact Confidence Intervals for Proportions Estimated by Group Testing with Different Group Sizes , 2015 .

[14]  F. Schaarschmidt Experimental design for one-sided confidence intervals or hypothesis tests in binomial group testing , 2007 .

[15]  W. Reeves,et al.  Statistical estimation of virus infection rates in mosquito vector populations. , 1962, American journal of hygiene.

[16]  R. Lampman,et al.  Assessment of arbovirus vector infection rates using variable size pooling , 2004, Medical and veterinary entomology.

[17]  Yinong Young-Xu,et al.  Pooling overdispersed binomial data to estimate event rate , 2008, BMC medical research methodology.

[18]  G. Hepworth Confidence intervals for proportions estimated by group testing with groups of unequal size , 2005 .

[19]  Keith H. Thompson,et al.  Estimation of the Proportion of Vectors in a Natural Population of Insects , 1962 .

[20]  William H. Swallow,et al.  Group testing for estimating infection rates and probabilities of disease transmission , 1985 .

[21]  B. Biggerstaff Confidence intervals for the difference of two proportions estimated from pooled samples , 2008 .

[22]  Jun S. Liu,et al.  STATISTICAL APPLICATIONS OF THE POISSON-BINOMIAL AND CONDITIONAL BERNOULLI DISTRIBUTIONS , 1997 .