Nonparametric estimation of Shannon’s index of diversity when there are unseen species in sample

A biological community usually has a large number of species with relatively small abundances. When a random sample of individuals is selected and each individual is classified according to species identity, some rare species may not be discovered. This paper is concerned with the estimation of Shannon’s index of diversity when the number of species and the species abundances are unknown. The traditional estimator that ignores the missing species underestimates when there is a non-negligible number of unseen species. We provide a different approach based on unequal probability sampling theory because species have different probabilities of being discovered in the sample. No parametric forms are assumed for the species abundances. The proposed estimation procedure combines the Horvitz–Thompson (1952) adjustment for missing species and the concept of sample coverage, which is used to properly estimate the relative abundances of species discovered in the sample. Simulation results show that the proposed estimator works well under various abundance models even when a relatively large fraction of the species is missing. Three real data sets, two from biology and the other one from numismatics, are given for illustration.

[1]  D. Horvitz,et al.  A Generalization of Sampling Without Replacement from a Finite Universe , 1952 .

[2]  I. Good THE POPULATION FREQUENCIES OF SPECIES AND THE ESTIMATION OF POPULATION PARAMETERS , 1953 .

[3]  R. Macarthur ON THE RELATIVE ABUNDANCE OF BIRD SPECIES. , 1957, Proceedings of the National Academy of Sciences of the United States of America.

[4]  G. Basharin On a Statistical Estimate for the Entropy of a Sequence of Independent Random Variables , 1959 .

[5]  Daniel H. Jazen Sweep Samples of Tropical Foliage Insects: Description of Study Sites, With Data on Species Abundances and Size Distributions , 1973 .

[6]  D. Janzen Sweep Samples of Tropical Foliage Insects: Effects of Seasons, Vegetation Types, Elevation, Time of Day, and Insularity , 1973 .

[7]  L. R. Shenton,et al.  Some moments of an estimate of shannon's measure of information , 1974 .

[8]  Robert K. Peet,et al.  The Measurement of Species Diversity , 1974 .

[9]  Batten La Bird communities of some Killarney woodlands. , 1976 .

[10]  Woollcott Smith,et al.  Sampling Properties of a Family of Diversity Measures , 1977 .

[11]  S. Zahl,et al.  JACKKNIFING AN INDEX OF DIVERSITY , 1977 .

[12]  B. Mandelbrot,et al.  Fractals: Form, Chance and Dimension , 1978 .

[13]  S. Engen,et al.  Stochastic abundance models. , 1978 .

[14]  L. Holst Some Asymptotic Results for Incomplete Multinomial or Poisson Samples , 1981 .

[15]  Warren W. Esty,et al.  The Efficiency of Good's Nonparametric Coverage Estimator , 1986 .

[16]  A. Magurran Ecological Diversity and Its Measurement , 1988, Springer Netherlands.

[17]  A. Chao,et al.  Estimating the Number of Classes via Sample Coverage , 1992 .

[18]  Andrew R. Solow,et al.  A simple test for change in community structure , 1993 .

[19]  J. Bunge,et al.  Estimating the Number of Species: A Review , 1993 .

[20]  A. Chao,et al.  Stopping rules and estimation for recapture debugging with unequal failure rates , 1993 .

[21]  Robert K. Colwell,et al.  Estimating terrestrial biodiversity through extrapolation. , 1994, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[22]  J. Bunge,et al.  Comparison of three estimators of the number of species , 1995 .

[23]  P. Haas,et al.  Estimating the Number of Classes in a Finite Population , 1998 .

[24]  A. Chao,et al.  ESTIMATING THE NUMBER OF SHARED SPECIES IN TWO COMMUNITIES , 2000 .

[25]  I. Goudie,et al.  Coverage-adjusted estimators for mark-recapture in heterogeneous populations , 2000 .

[26]  Roger W. Johnson,et al.  An Introduction to the Bootstrap , 2001 .

[27]  J. Norris,et al.  Non-parametric MLE for Poisson species abundance models allowing for heterogeneity between species , 1998, Environmental and Ecological Statistics.