Estimation of a two‐component mixture model with applications to multiple testing

We consider a two-component mixture model with one known component. We develop methods for estimating the mixing proportion and the other unknown distribution nonparametrically, given i.i.d. data from the mixture model. We use ideas from shape restricted function estimation and develop “tuning parameter free” estimators that are easily implementable and have good finite sample performance. We establish the consistency of our procedures. Distribution-free finite sample lower confidence bounds are developed for the mixing proportion. The identifiability of the model, and the estimation of the density of the unknown mixing distribution are also addressed. We discuss the connection with the problem of multiple testing and compare our procedure with some of the existing methods in that area through simulation studies. We also analyse two data sets, one arising from an application in astronomy and the other from a microarray experiment.

[1]  Estimation of the number of true null hypotheses when conducting a multiple testing , 2010 .

[2]  Projection estimators of Pickands dependence functions , 2008 .

[3]  Mario Mateo,et al.  Velocity Dispersion Profiles of Seven Dwarf Spheroidal Galaxies , 2007, 0708.0010.

[4]  H. Barnett A Theory of Mortality , 1968 .

[5]  M. J.,et al.  CONTROLLING THE FALSE-DISCOVERY RATE IN ASTROPHYSICAL DATA ANALYSIS , 2001 .

[6]  B. Lindsay Mixture models : theory, geometry, and applications , 1995 .

[7]  Mario Mateo,et al.  CLEAN KINEMATIC SAMPLES IN DWARF SPHEROIDALS: AN ALGORITHM FOR EVALUATING MEMBERSHIP AND ESTIMATING DISTRIBUTION PARAMETERS WHEN CONTAMINATION IS PRESENT , 2008, 0811.1990.

[8]  G. Walther Detecting the Presence of Mixing with Multiscale Maximum Likelihood , 2002 .

[9]  M. A. Black,et al.  A note on the adaptive control of false discovery rates , 2004 .

[10]  H. D. Brunk,et al.  Statistical inference under order restrictions : the theory and application of isotonic regression , 1973 .

[11]  F. T. Wright,et al.  Order restricted statistical inference , 1988 .

[12]  A. Cohen,et al.  Estimation in Mixtures of Two Normal Distributions , 1967 .

[13]  Jean-Jacques Daudin,et al.  A semi-parametric approach for mixture models: Application to local false discovery rate estimation , 2007, Comput. Stat. Data Anal..

[14]  M. Kulldorff,et al.  A Space–Time Permutation Scan Statistic for Disease Outbreak Detection , 2005, PLoS medicine.

[15]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[16]  Stéphane Robin,et al.  A cross-validation based estimation of the proportion of true null hypotheses , 2010 .

[17]  Isaac Dialsingh,et al.  Large-scale inference: empirical Bayes methods for estimation, testing, and prediction , 2012 .

[18]  N. E. Day Estimating the components of a mixture of normal distributions , 1969 .

[19]  Louis Lyons,et al.  Open statistical issues in Particle Physics , 2008, 0811.1663.

[20]  Jiashun Jin,et al.  Optimal rates of convergence for estimating the null density and proportion of nonnull effects in large-scale multiple testing , 2010, 1001.1609.

[21]  F. Turkheimer,et al.  Estimation of the Number of “True” Null Hypotheses in Multivariate Analysis of Neuroimaging Data , 2001, NeuroImage.

[22]  J. B. Ramsey,et al.  Estimating Mixtures of Normal Distributions and Switching Regressions , 1978 .

[23]  S. Derriere,et al.  Erratum: A synthetic view on structure and evolution of the Milky Way , 2004 .

[24]  L. Bordes,et al.  SEMIPARAMETRIC ESTIMATION OF A TWO-COMPONENT MIXTURE MODEL , 2006, math/0607812.

[25]  N. Meinshausen,et al.  Estimating the proportion of false null hypotheses among a large number of independently tested hypotheses , 2005, math/0501289.

[26]  John D. Storey A direct approach to false discovery rates , 2002 .

[27]  Emanuel Parzen,et al.  Modern Probability Theory And Its Applications , 1962 .

[28]  B. Lindqvist,et al.  Estimating the proportion of true null hypotheses, with application to DNA microarray data , 2005 .

[29]  Peter Bühlmann,et al.  Lower bounds for the number of false null hypotheses for multiple testing of associations under general dependence structures , 2005 .

[30]  S. Geer Applications of empirical process theory , 2000 .

[31]  L. Wasserman,et al.  A stochastic process approach to false discovery control , 2004, math/0406519.

[32]  Jiashun Jin Proportion of non‐zero normal means: universal oracle equivalences and uniformly consistent estimators , 2008 .

[33]  Philip B. Stark,et al.  Finite-Sample Confidence Envelopes for Shape-Restricted Densities , 1995 .

[34]  Miss A.O. Penney (b) , 1974, The New Yale Book of Quotations.

[35]  Y. Benjamini,et al.  Adaptive linear step-up procedures that control the false discovery rate , 2006 .

[36]  A. Banerjee Convex Analysis and Optimization , 2006 .

[37]  B. Lindsay The Geometry of Mixture Likelihoods: A General Theory , 1983 .

[38]  D. Hunter,et al.  Inference for mixtures of symmetric distributions , 2007, 0708.0499.

[39]  C. Witzgall,et al.  Projections onto order simplexes , 1984 .

[40]  D. Donoho,et al.  Higher criticism for detecting sparse heterogeneous mixtures , 2004, math/0410072.

[41]  J. Swanepoel The limiting behavior of a modified maximal symmetric $2s$-spacing with applications , 1999 .

[42]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[43]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[44]  William Feller,et al.  An Introduction to Probability Theory and Its Applications , 1967 .

[45]  Christopher J. Miller,et al.  Controlling the False-Discovery Rate in Astrophysical Data Analysis , 2001, astro-ph/0107034.

[46]  Y. Benjamini,et al.  On the Adaptive Control of the False Discovery Rate in Multiple Testing With Independent Statistics , 2000 .

[47]  Bradley Efron,et al.  Large-scale inference , 2010 .

[48]  U. Grenander On the theory of mortality measurement , 1956 .

[49]  B. Efron Size, power and false discovery rates , 2007, 0710.2245.

[50]  Jiashun Jin,et al.  Estimation and Confidence Sets for Sparse Normal Mixtures , 2006, math/0612623.

[51]  Philip Chan,et al.  Determining the number of clusters/segments in hierarchical clustering/segmentation algorithms , 2004, 16th IEEE International Conference on Tools with Artificial Intelligence.

[52]  D. Pollard Asymptotics via Empirical Processes , 1989 .

[53]  T. W. Anderson,et al.  Asymptotic Theory of Certain "Goodness of Fit" Criteria Based on Stochastic Processes , 1952 .

[54]  B. Lindsay,et al.  Multivariate Normal Mixtures: A Fast Consistent Method of Moments , 1993 .

[55]  Guenther Walther,et al.  Multiscale maximum likelihood analysis of a semiparametric model , 2001 .

[56]  Christopher R. Genovese,et al.  A Stochastic Process Approach to False Discovery Rates , 2003 .

[57]  C. Matias,et al.  On Efficient Estimators of the Proportion of True Null Hypotheses in a Multiple Testing Setup , 2012, 1205.4097.