论文信息 - A Penalized Nonparametric Maximum Likelihood Approach to Species Richness Estimation

A Penalized Nonparametric Maximum Likelihood Approach to Species Richness Estimation

We propose a class of penalized nonparametric maximum likelihood estimators (NPMLEs) for the species richness problem. We use a penalty term on the likelihood because likelihood estimators that lack it have an extreme instability problem. The estimators are constructed using a conditional likelihood that is simpler than the full likelihood. We show that the full-likelihood NPMLE solution given by Norris and Pollock can be found (with great accuracy) by using an appropriate penalty term on the conditional likelihood, so it is an element of our class of estimators. A simple and fast algorithm for the penalized NPMLE is developed; it can be used to greatly speed up computation of the unconditional NPMLE. It can also be used to find profile mixture likelihoods. Based on our goal of attaining high stability while retaining sensitivity, we propose an adaptive quadratic penalty function. A systematic simulation study, using a wide range of scenarios, establishes the success of this method relative to its competitors. Finally, we discuss an application in the gene number estimation using expressed sequence tag (EST) data from genomics.

B. Lindsay | Ji-ping Wang

[1] Ji-Ping Z. Wang,et al. EST clustering error evaluation and correction , 2004, Bioinform..

[2] J. Norris,et al. Non-parametric MLE for Poisson species abundance models allowing for heterogeneity between species , 1998, Environmental and Ecological Statistics.

[3] Bruce G. Lindsay,et al. Tests and diagnostics for heterogeneity in the species problem , 2003, Comput. Stat. Data Anal..

[4] John Bunge,et al. Estimating the Number of Species in a Stochastic Abundance Model , 2002, Biometrics.

[5] S. Pledger. Unified Maximum Likelihood Estimates for Closed Capture–Recapture Models Using Mixtures , 2000, Biometrics.

[6] A. Chao,et al. ESTIMATING THE NUMBER OF SHARED SPECIES IN TWO COMMUNITIES , 2000 .

[7] Y. Nakamura,et al. A large scale analysis of cDNA in Arabidopsis thaliana: generation of 12,028 non-redundant expressed sequence tags from normalized and size-selected cDNA libraries. , 2000, DNA research : an international journal for rapid publication of reports on genes and genomes.

[8] P. Haas,et al. Estimating the Number of Classes in a Finite Population , 1998 .

[9] J. Norris,et al. NONPARAMETRIC MLE UNDER TWO CLOSED CAPTURE-RECAPTURE MODELS WITH HETEROGENEITY , 1996 .

[10] B. Lindsay. Mixture models : theory, geometry, and applications , 1995 .

[11] W. Heyer,et al. Estimating Population Size , 1994 .