Nonparametric prediction in species sampling

Consider a continuous-time stochastic model in which species arrive in the sample according to independent Poisson processes and where the species discovery rates are heterogeneous. Based on an initial survey, we are concerned with the problem of predicting the number of new species that would be discovered by additional sampling. When the sampling time or sample size of the additional sample tends to infinity, this problem reduces to the prediction of the number of undetected species in the original sample, or equivalently, the estimation of species richness. The topic has a wide range of applications in various disciplines. We propose a simple prediction method and apply it to two datasets. One set of data deals with the capture counts of the Malayan butterfly and the other set deals with identification records of organic pollutants in a water environment. Simulation results are shown to investigate the performance of the proposed method and to compare it with the existing estimators.

[1]  J. Nichols,et al.  ESTIMATING SPECIES RICHNESS: THE IMPORTANCE OF HETEROGENEITY IN SPECIES DETECTABILITY , 1998 .

[2]  B. Efron,et al.  Estimating the number of unseen species: How many words did Shakespeare know? Biometrika 63 , 1976 .

[3]  N. S. Urquhart,et al.  Patterns in the Balance of Nature , 1966 .

[4]  K. Burnham,et al.  Estimation of the size of a closed population when capture probabilities vary among animals , 1978 .

[5]  A. Agresti,et al.  The Use of Mixed Logit Models to Reflect Heterogeneity in Capture‐Recapture Studies , 1999, Biometrics.

[6]  J. Bunge,et al.  Estimating the Number of Species: A Review , 1993 .

[7]  James F. Quinn,et al.  ESTIMATING THE EFFECTIVENESS OF FURTHER SAMPLING INSPECIES INVENTORIES , 1998 .

[8]  A. Agresti Simple capture-recapture models permitting unequal catchability and variable sampling effort. , 1994, Biometrics.

[9]  J. Norris,et al.  Non-parametric MLE for Poisson species abundance models allowing for heterogeneity between species , 1998, Environmental and Ecological Statistics.

[10]  A. Chao,et al.  Estimating the Number of Classes via Sample Coverage , 1992 .

[11]  I. Good,et al.  THE NUMBER OF NEW SPECIES, AND THE INCREASE IN POPULATION COVERAGE, WHEN A SAMPLE IS INCREASED , 1956 .

[12]  G. Belle,et al.  Nonparametric estimation of species richness , 1984 .

[13]  Robert K. Colwell,et al.  Estimating terrestrial biodiversity through extrapolation. , 1994, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[14]  A. Chao,et al.  ESTIMATING THE NUMBER OF SHARED SPECIES IN TWO COMMUNITIES , 2000 .

[15]  J. Keith Ord,et al.  The poisson-inverse gaussian disiribuiion as a model for species abundance , 1986 .

[16]  M. Bulmer On Fitting the Poisson Lognormal Distribution to Species-Abundance Data , 1974 .

[17]  I. Good THE POPULATION FREQUENCIES OF SPECIES AND THE ESTIMATION OF POPULATION PARAMETERS , 1953 .

[18]  I. Good,et al.  Fractals: Form, Chance and Dimension , 1978 .

[19]  S. Boneh,et al.  Estimating the Prediction Function and the Number of Unseen Species in Sampling with Replacement , 1998 .

[20]  S. Engen,et al.  Stochastic abundance models. , 1978 .

[21]  K. G. Janardan,et al.  Methods for estimating the number of identifiable organic pollutants in the aquatic environment , 1981 .

[22]  A. Chao,et al.  Stopping rules and estimation for recapture debugging with unequal failure rates , 1993 .

[23]  A. Chao,et al.  PREDICTING THE NUMBER OF NEW SPECIES IN FURTHER TAXONOMIC SAMPLING , 2003 .

[24]  R. Fisher,et al.  The Relation Between the Number of Species and the Number of Individuals in a Random Sample of an Animal Population , 1943 .

[25]  B. Weir,et al.  Estimating the total number of alleles using a sample coverage method. , 2001, Genetics.

[26]  Andrew R. Solow,et al.  A QUICK ESTIMATOR FOR TAXONOMIC SURVEYS , 1999 .