Inference on the Number of Species Through Geometric Lower Bounds

Estimating the number of species in a population from a sample of individuals is investigated in a nonparametric Poisson mixture model. A sequence of lower bounds to the odds that a species is unseen in the sample are proposed from a geometric perspective. A lower bound and its representing mixing distribution can be computed by linear programming with guaranteed convergence. These lower bounds can be estimated by the maximum likelihood method and used to construct lower confidence limits for the number of species by the bootstrap method. Computing the nonparametric maximum likelihood estimator is discussed. Simulation is used to assess the performance of estimated lower bounds and compare them with several existing estimators. A genomic application is investigated.

[1]  B. Lindsay,et al.  Estimating the number of classes , 2007, 0708.2153.

[2]  D. Böhning,et al.  Nonparametric maximum likelihood estimation of population size based on the counting distribution , 2005 .

[3]  M. J. Laan Nonparametric Maximum Likelihood , 2005 .

[4]  C. Mao Predicting the Conditional Probability of Discovering a New Class , 2004 .

[5]  John Bunge,et al.  Estimating the Number of Species in a Stochastic Abundance Model , 2002, Biometrics.

[6]  G. Wood Binomial mixtures: geometric estimation of the mixing distribution , 1999 .

[7]  J. Norris,et al.  Non-parametric MLE for Poisson species abundance models allowing for heterogeneity between species , 1998, Environmental and Ecological Statistics.

[8]  D. Böhning A review of reliable maximum likelihood algorithms for semiparametric mixture models , 1995 .

[9]  K. Greulich,et al.  Beware of using small statistical samples when assessing the quality of a DNA library. , 1994, Genomics.

[10]  Robert K. Colwell,et al.  Estimating terrestrial biodiversity through extrapolation. , 1994, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[11]  J. Bunge,et al.  Estimating the Number of Species: A Review , 1993 .

[12]  A. Chao,et al.  Estimating the Number of Classes via Sample Coverage , 1992 .

[13]  Anne Chao,et al.  Estimating population size for sparse data in capture-recapture experiments , 1989 .

[14]  K. Roeder,et al.  A Unified Treatment of Integer Parameter Models , 1987 .

[15]  B. Lindsay The Geometry of Mixture Likelihoods: A General Theory , 1983 .

[16]  Gail Gong,et al.  Pseudo Maximum Likelihood Estimation: Theory and Applications , 1981 .

[17]  J. Darroch,et al.  A Note on Capture-Recapture Estimation , 1980 .

[18]  B. Efron,et al.  Estimating the number of unseen species: How many words did Shakespeare know? Biometrika 63 , 1976 .

[19]  B. Hochman Analysis of chromosome 4 in Drosophila melanogaster. II. Ethyl methanesulfonate induced lethals. , 1971, Genetics.

[20]  P. McCullagh Estimating the Number of Unseen Species: How Many Words did Shakespeare Know? , 2008 .

[21]  A. Chao Nonparametric estimation of the number of classes in a population , 1984 .

[22]  HighWire Press Philosophical Transactions of the Royal Society of London , 1781, The London Medical Journal.