A Penalized Nonparametric Maximum Likelihood Approach to Species Richness Estimation

We propose a class of penalized nonparametric maximum likelihood estimators (NPMLEs) for the species richness problem. We use a penalty term on the likelihood because likelihood estimators that lack it have an extreme instability problem. The estimators are constructed using a conditional likelihood that is simpler than the full likelihood. We show that the full-likelihood NPMLE solution given by Norris and Pollock can be found (with great accuracy) by using an appropriate penalty term on the conditional likelihood, so it is an element of our class of estimators. A simple and fast algorithm for the penalized NPMLE is developed; it can be used to greatly speed up computation of the unconditional NPMLE. It can also be used to find profile mixture likelihoods. Based on our goal of attaining high stability while retaining sensitivity, we propose an adaptive quadratic penalty function. A systematic simulation study, using a wide range of scenarios, establishes the success of this method relative to its competitors. Finally, we discuss an application in the gene number estimation using expressed sequence tag (EST) data from genomics.

[1]  Ji-Ping Z. Wang,et al.  EST clustering error evaluation and correction , 2004, Bioinform..

[2]  J. Norris,et al.  Non-parametric MLE for Poisson species abundance models allowing for heterogeneity between species , 1998, Environmental and Ecological Statistics.

[3]  Bruce G. Lindsay,et al.  Tests and diagnostics for heterogeneity in the species problem , 2003, Comput. Stat. Data Anal..

[4]  John Bunge,et al.  Estimating the Number of Species in a Stochastic Abundance Model , 2002, Biometrics.

[5]  S. Pledger Unified Maximum Likelihood Estimates for Closed Capture–Recapture Models Using Mixtures , 2000, Biometrics.

[6]  A. Chao,et al.  ESTIMATING THE NUMBER OF SHARED SPECIES IN TWO COMMUNITIES , 2000 .

[7]  Y. Nakamura,et al.  A large scale analysis of cDNA in Arabidopsis thaliana: generation of 12,028 non-redundant expressed sequence tags from normalized and size-selected cDNA libraries. , 2000, DNA research : an international journal for rapid publication of reports on genes and genomes.

[8]  P. Haas,et al.  Estimating the Number of Classes in a Finite Population , 1998 .

[9]  J. Norris,et al.  NONPARAMETRIC MLE UNDER TWO CLOSED CAPTURE-RECAPTURE MODELS WITH HETEROGENEITY , 1996 .

[10]  B. Lindsay Mixture models : theory, geometry, and applications , 1995 .

[11]  W. Heyer,et al.  Estimating Population Size , 1994 .

[12]  K. Roeder,et al.  Uniqueness of estimation and identifiability in mixture models , 1993 .

[13]  A. Chao,et al.  Stopping rules and estimation for recapture debugging with unequal failure rates , 1993 .

[14]  J. Bunge,et al.  Estimating the Number of Species: A Review , 1993 .

[15]  K. Roeder,et al.  Residual diagnostics for mixture models , 1992 .

[16]  A. Chao,et al.  Estimating the Number of Classes via Sample Coverage , 1992 .

[17]  A. Kerlavage,et al.  Complementary DNA sequencing: expressed sequence tags and human genome project , 1991, Science.

[18]  A. Kan,et al.  A multinomial Bayesian approach to the estimation of population and vocabulary size , 1987 .

[19]  K. Roeder,et al.  A Unified Treatment of Integer Parameter Models , 1987 .

[20]  D. Joanes,et al.  Bayesian estimation of the number of species , 1984 .

[21]  G. Belle,et al.  Nonparametric estimation of species richness , 1984 .

[22]  A. Chao Nonparametric estimation of the number of classes in a population , 1984 .

[23]  B. Efron Nonparametric standard errors and confidence intervals , 1981 .

[24]  K. Burnham,et al.  Robust Estimation of Population Size When Capture Probabilities Vary Among Animals , 1979 .

[25]  Bruce M. Hill,et al.  Posterior Moments of the Number of Species in a Finite Population and the Posterior Probability of Finding a New Species , 1979 .

[26]  David R. Anderson,et al.  Statistical inference from capture data on closed animal populations , 1980 .

[27]  K. Burnham,et al.  Estimation of the size of a closed population when capture probabilities vary among animals , 1978 .

[28]  L. Sanathanan Estimating the Size of a Truncated Sample , 1977 .

[29]  S. Blumenthal Estimating population size with truncated sampling , 1977 .

[30]  S. Blumenthal,et al.  Estimating Population Size with Exponential Failure , 1975 .

[31]  Lalitha Sanathanan,et al.  ESTIMATING THE SIZE OF A MULTINOMIAL POPULATION , 1972 .

[32]  R. Fisher,et al.  The Relation Between the Number of Species and the Number of Individuals in a Random Sample of an Animal Population , 1943 .