Thirty years of progeny from Chao’s inequality: Estimating and comparing richness with incidence data and incomplete sampling

In the context of capture-recapture studies, Chao (1987) derived an inequality among capture frequency counts to obtain a lower bound for the size of a population based on individuals’ capture/non-capture records for multiple capture occasions. The inequality has been applied to obtain a non-parametric lower bound of species richness of an assemblage based on species incidence (detection/non-detection) data in multiple sampling units. The inequality implies that the number of undetected species can be inferred from the species incidence frequency counts of the uniques (species detected in only one sampling unit) and duplicates (species detected in exactly two sampling units). In their pioneering paper, Colwell and Coddington (1994) gave the name “Chao2” to the estimator for the resulting species richness. (The “Chao1” estimator refers to a similar type of estimator based on species abundance data). Since then, the Chao2 estimator has been applied to many research fields and led to fruitful generalizations. Here, we first review Chao’s inequality under various models and discuss some related statistical inference questions: (1) Under what conditions is the Chao2 estimator an unbiased point estimator? (2) How many additional sampling units are needed to detect any arbitrary proportion (including 100%) of the Chao2 estimate of asymptotic species richness? (3) Can other incidence frequency counts be used to obtain similar lower bounds? We then show how the Chao2 estimator can be also used to guide a non-asymptotic analysis in which species richness estimators can be compared for equally-large or equally-complete samples via sample-size-based and coverage-based rarefaction and extrapolation. We also review the generalization of Chao’s inequality to estimate species richness under other sampling-without-replacement schemes (e.g. a set of quadrats, each surveyed only once), to obtain a lower bound of undetected species shared between two or multiple assemblages, and to allow inferences about undetected phylogenetic richness (the total length of undetected branches of a phylogenetic tree connecting all species), with associated rarefaction and extrapolation. A small empirical dataset for Australian birds is used for illustration, using online software SpadeR, iNEXT, and PhD.

[1]  G. Hughes,et al.  Using the beta-binomial distribution to describe aggregated patterns of disease incidence , 1993 .

[2]  Shigeo Takahashi,et al.  A measure for spatial heterogeneity of a grassland vegetation based on the beta-binomial distribution , 2000 .

[3]  M. Hill Diversity and Evenness: A Unifying Notation and Its Consequences , 1973 .

[4]  Anne E. Magurran,et al.  Biological Diversity: Frontiers in Measurement and Assessment , 2011 .

[5]  D. Faith Conservation evaluation and phylogenetic diversity , 1992 .

[6]  Elizabeth L. Sander,et al.  Rarefaction and extrapolation with Hill numbers: a framework for sampling and estimation in species diversity studies , 2014 .

[7]  A. Chao,et al.  An improved nonparametric lower bound of species richness via a modified good–turing frequency formula , 2014, Biometrics.

[8]  M. Willig,et al.  Randomness, Area, and Species Richness , 1982 .

[9]  Anne Chao,et al.  Measuring and Estimating Species Richness, Species Diversity, and Biotic Similarity from Sampling Data , 2013 .

[10]  B. Lindsay,et al.  Estimating the number of classes , 2007, 0708.2153.

[11]  W. Jetz,et al.  The global diversity of birds in space and time , 2012, Nature.

[12]  D. Bőhning,et al.  A covariate adjustment for zero-truncated approaches to estimating the size of hidden and elusive populations , 2009, 0908.2296.

[13]  C. Mao,et al.  On Population Size Estimators in the Poisson Mixture Model , 2013, Biometrics.

[14]  Kevin J. Gaston,et al.  Functional diversity (FD), species richness and community composition , 2002 .

[15]  Daniel Simberloff,et al.  Properties of the Rarefaction Diversity Measurement , 1972, The American Naturalist.

[16]  Robert K. Colwell,et al.  Statistical methods for estimating species richness of woody regeneration in primary and secondary rain forests of northeastern Costa Rica , 1998 .

[17]  A. Chao,et al.  Phylogenetic diversity measures based on Hill numbers , 2010, Philosophical Transactions of the Royal Society B: Biological Sciences.

[18]  Louis-Paul Rivest,et al.  Applications and extensions of Chao's moment estimator for the size of a closed population. , 2007, Biometrics.

[19]  C. C. Kokonendji,et al.  Non‐parametric Estimation of the Number of Zeros in Truncated Count Distributions , 2018 .

[20]  A. Chao,et al.  Coverage-based rarefaction and extrapolation: standardizing samples by completeness rather than size. , 2012, Ecology.

[21]  G. Belle,et al.  Explicit Calculation of the Rarefaction Diversity Measurement and the Determination of Sufficient Sample Size , 1975 .

[22]  A. Magurran,et al.  Measuring Biological Diversity , 2004 .

[23]  Anne Chao,et al.  Sufficient sampling for asymptotic minimum species richness estimators. , 2009, Ecology.

[24]  H. L. Sanders,et al.  Marine Benthic Diversity: A Comparative Study , 1968, The American Naturalist.

[25]  Anne Chao,et al.  An overview of closed capture-recapture models , 2001 .

[26]  Robert K. Colwell,et al.  Abundance‐Based Similarity Indices and Their Estimation When There Are Unseen Species in Samples , 2006, Biometrics.

[27]  Campbell O. Webb,et al.  Phylomatic: tree assembly for applied phylogenetics , 2005 .

[28]  J. G. Skellam A Probability Distribution Derived from the Binomial Distribution by Regarding the Probability of Success as Variable between the Sets of Trials , 1948 .

[29]  Robert K. Colwell,et al.  EstimateS turns 20: statistical estimation of species richness and shared species from samples, with non‐parametric extrapolation , 2014 .

[30]  C. Mao LOWER BOUNDS TO THE POPULATION SIZE WHEN CAPTURE PROBABILITIES VARY OVER INDIVIDUALS , 2008 .

[31]  Robert K. Colwell,et al.  INTERPOLATING, EXTRAPOLATING, AND COMPARING INCIDENCE-BASED SPECIES ACCUMULATION CURVES , 2004 .

[32]  A. Chao Species Estimation and Applications , 2006 .

[33]  Paulo A. V. Borges,et al.  A new frontier in biodiversity inventory: a proposal for estimators of phylogenetic and functional diversity , 2014 .

[34]  A. Chao,et al.  Rarefaction and Extrapolation: Making Fair Comparison of Abundance‐Sensitive Phylogenetic Diversity among Multiple Assemblages , 2016, Systematic biology.

[35]  A Chao,et al.  Estimating population size for capture-recapture data when capture probabilities vary by time and individual animal. , 1992, Biometrics.

[36]  D. Böhning,et al.  A Generalization of Chao's Estimator for Covariate Information , 2013, Biometrics.

[37]  S. Hurlbert The Nonconcept of Species Diversity: A Critique and Alternative Parameters. , 1971, Ecology.

[38]  I. Good Good Thinking: The Foundations of Probability and Its Applications , 1983 .

[39]  Robert K. Colwell,et al.  Models and estimators linking individual-based and sample-based rarefaction, extrapolation and comparison of assemblages , 2012 .

[40]  I. Good THE POPULATION FREQUENCIES OF SPECIES AND THE ESTIMATION OF POPULATION PARAMETERS , 1953 .

[41]  Robert K. Colwell,et al.  Quantifying biodiversity: procedures and pitfalls in the measurement and comparison of species richness , 2001 .

[42]  Todd H. Oakley,et al.  Using Phylogenetic, Functional and Trait Diversity to Understand Patterns of Plant Community Productivity , 2009, PloS one.

[43]  Anne Chao,et al.  Unifying Species Diversity, Phylogenetic Diversity, Functional Diversity, and Related Similarity and Differentiation Measures Through Hill Numbers , 2014 .

[44]  J. Cavender-Bares,et al.  Integrating ecology and phylogenetics: the footprint of history in modern‐day communities1 , 2012 .

[45]  Chang Xuan Mao,et al.  Inference on the Number of Species Through Geometric Lower Bounds , 2006 .

[46]  A. Chao Estimating the population size for capture-recapture data with unequal catchability. , 1987, Biometrics.

[47]  Sharon Bertsch McGrayne,et al.  The Theory That Would Not Die: How Bayes' Rule Cracked the Enigma Code, Hunted Down Russian Submarines, and Emerged Triumphant from Two Centuries of Controversy , 2011 .

[48]  R. Crozier Preserving the Information Content of Species: Genetic Diversity, Phylogeny, and Conservation Worth , 1997 .

[49]  K. R. Clarke,et al.  New \'biodiversity\' measures reveal a decrease in taxonomic distinctness with increasing stress , 1995 .