Inherent difficulties in nonparametric estimation of the cumulative distribution function using observations measured with error: Application to high-dimensional microarray data

Distribution function estimation is important in many biological applications. A very simple example is given to show that with the addition of normal errors, data from very different underlying distributions can generate nearly identical distributions of observations. Therefore, in some situations it can be essentially impossible to accurately estimate an underlying cumulative distribution function from a reasonable number of observations measured with error. An application is given involving estimating the distribution function of differential gene expression based on more than fifty thousand genes.

[1]  B. Efron Tweedie’s Formula and Selection Bias , 2011, Journal of the American Statistical Association.

[2]  L. Staudt,et al.  Stromal gene signatures in large-B-cell lymphomas. , 2008, The New England journal of medicine.

[3]  L. Brown In-season prediction of batting averages: A field test of empirical Bayes and Bayes methodologies , 2008, 0803.3697.

[4]  Yingdong Zhao,et al.  How Large a Training Set is Needed to Develop a Classifier for Microarray Data? , 2008, Clinical Cancer Research.

[5]  Mark A van de Wiel,et al.  Estimating the False Discovery Rate Using Nonparametric Deconvolution , 2007, Biometrics.

[6]  K. Dodd,et al.  Most Americans eat much less than recommended amounts of fruits and vegetables. , 2006, Journal of the American Dietetic Association.

[7]  Adrian Wiestner,et al.  A gene expression-based method to diagnose clinically distinct subgroups of diffuse large B cell lymphoma , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[8]  B. Lindsay,et al.  Alternative EM methods for nonparametric finite mixture models , 2001 .

[9]  J. Wellner,et al.  Information Bounds and Nonparametric Maximum Likelihood Estimation , 1992 .

[10]  M. Ghosh Constrained Bayes Estimation with Applications , 1992 .

[11]  T. Louis,et al.  Smoothing the non-parametric estimate of a prior distribution by roughening: A computational study , 1991 .

[12]  P. Hall,et al.  Optimal Rates of Convergence for Deconvolving a Density , 1988 .

[13]  H. Robbins Some Thoughts on Empirical Bayes Estimation , 1983 .

[14]  N. Laird Nonparametric Maximum Likelihood Estimation of a Mixing Distribution , 1978 .

[15]  Calyampudi R. Rao,et al.  Characterization Problems in Mathematical Statistics , 1976 .

[16]  H. Cramér Über eine Eigenschaft der normalen Verteilungsfunktion , 1936 .

[17]  B. Lindsay Mixture models : theory, geometry, and applications , 1995 .

[18]  F. W. Scholz,et al.  Towards a unified definition of maximum likelihood , 1980 .

[19]  S. G. Maloshevskii Sharpness of an Estimate of N. A. Sapogov in the Stability Problem of Cramér’s Theorem , 1968 .