Extending the Patra-Sen Approach to Estimating the Background Component in a Two-Component Mixture Model

Patra and Sen (2016) consider a two-component mixture model, where one component plays the role of background while the other plays the role of signal, and propose to estimate the background component by simply ‘maximizing’ its weight. While in their work the background component is a completely known distribution, we extend their approach here to three emblematic settings: when the background distribution is symmetric; when it is monotonic; and when it is log-concave. In each setting, we derive estimators for the background component, establish consistency, and provide a confidence band. While the estimation of a background component is straightforward when it is taken to be symmetric or monotonic, when it is log-concave its estimation requires the computation of a largest concave minorant, which we implement using sequential quadratic programming. Compared to existing methods, our method has the advantage of requiring much less prior knowledge on the background component, and is thus less prone to model misspecification. We illustrate this methodology on a number of synthetic and real datasets.

[1]  C. Loader Local Likelihood Density Estimation , 1996 .

[2]  Zuofeng Shang,et al.  An MM algorithm for estimation of a two component semiparametric density mixture with a known component , 2018 .

[3]  J. Tukey A survey of sampling from contaminated distributions , 1960 .

[4]  T. Cai,et al.  Estimating the Null and the Proportion of Nonnull Effects in Large-Scale Multiple Comparisons , 2006, math/0611108.

[5]  S.,et al.  CONSISTENT CROSS-VALIDATED DENSITY ESTIMATION , 2022 .

[6]  W. Yao,et al.  Flexible estimation of a semiparametric two-component mixture model with one parametric component , 2015 .

[7]  R. Nickl,et al.  Mathematical Foundations of Infinite-Dimensional Statistical Models , 2015 .

[8]  Chong Gu,et al.  Smoothing spline density estimation: theory , 1993 .

[9]  Mario Mateo,et al.  Velocity Dispersion Profiles of Seven Dwarf Spheroidal Galaxies , 2007, 0708.0010.

[10]  A. Robin,et al.  A synthetic view on structure and evolution of the Milky Way , 2003 .

[11]  A new algorithm for approximating the least concave majorant , 2016, 1608.02581.

[12]  P. Deb Finite Mixture Models , 2008 .

[13]  Y. Benjamini,et al.  On the Adaptive Control of the False Discovery Rate in Multiple Testing With Independent Statistics , 2000 .

[14]  Jie Mi,et al.  Robust Nonparametric Statistical Methods , 1999, Technometrics.

[15]  Hao Hu,et al.  Maximum likelihood estimation of the mixture of log-concave densities , 2016, Comput. Stat. Data Anal..

[16]  N. Meinshausen,et al.  Estimating the proportion of false null hypotheses among a large number of independently tested hypotheses , 2005, math/0501289.

[17]  M. C. Jones,et al.  A reliable data-based bandwidth selection method for kernel density estimation , 1991 .

[18]  L. Wasserman,et al.  A stochastic process approach to false discovery control , 2004, math/0406519.

[19]  L. Bordes,et al.  SEMIPARAMETRIC ESTIMATION OF A TWO-COMPONENT MIXTURE MODEL , 2006, math/0607812.

[20]  John D. Storey A direct approach to false discovery rates , 2002 .

[21]  Jianqing Fan Local Linear Regression Smoothers and Their Minimax Efficiencies , 1993 .

[22]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[23]  E. Arias-Castro,et al.  Distribution-free Multiple Testing , 2016, 1604.07520.

[24]  B. Lindsay,et al.  Multivariate Normal Mixtures: A Fast Consistent Method of Moments , 1993 .

[25]  B. Lindqvist,et al.  Estimating the proportion of true null hypotheses, with application to DNA microarray data , 2005 .

[26]  Yen-Chi Chen,et al.  Nonparametric inference via bootstrapping the debiased estimator , 2017, Electronic Journal of Statistics.

[27]  Eugene F. Schuster,et al.  Incorporating support constraints into nonparametric estimators of densities , 1985 .

[28]  Daren B. H. Cline,et al.  Kernel Estimation of Densities with Discontinuities or Discontinuous Derivatives , 1991 .

[29]  P. Gill,et al.  Sequential Quadratic Programming Methods , 2012 .

[30]  Guenther Walther,et al.  Clustering with mixtures of log-concave distributions , 2007, Comput. Stat. Data Anal..

[31]  Jiashun Jin Proportion of non‐zero normal means: universal oracle equivalences and uniformly consistent estimators , 2008 .

[32]  P. J. Green,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[33]  Roger E Bumgarner,et al.  Cellular Gene Expression upon Human Immunodeficiency Virus Type 1 Infection of CD4+-T-Cell Lines , 2003, Journal of Virology.

[34]  G. Walther Inference and Modeling with Log-concave Distributions , 2009, 1010.0305.

[35]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[36]  Matias D. Cattaneo,et al.  Simple Local Polynomial Density Estimators , 2018, Journal of the American Statistical Association.

[37]  B. Lindsay Mixture models : theory, geometry, and applications , 1995 .

[38]  Geurt Jongbloed,et al.  The Iterative Convex Minorant Algorithm for Nonparametric Estimation , 1998 .

[39]  John D. Storey,et al.  Empirical Bayes Analysis of a Microarray Experiment , 2001 .

[40]  V. Gorokhovik Minimal convex majorants of functions and Demyanov–Rubinov exhaustive super(sub)differentials , 2018, Optimization.

[41]  Ery Arias-Castro,et al.  Distribution-free tests for sparse heterogeneous mixtures , 2013, 1308.0346.

[42]  A. Cohen,et al.  Estimation in Mixtures of Two Normal Distributions , 1967 .

[43]  Yen-Chi Chen,et al.  A tutorial on kernel density estimation and recent advances , 2017, 1704.03924.

[44]  Geoffrey J. McLachlan,et al.  Mixture models : inference and applications to clustering , 1989 .

[45]  Matias D. Cattaneo,et al.  lpdensity: Local Polynomial Density Estimation and Inference , 2019, J. Stat. Softw..

[46]  D. Hunter,et al.  Inference for mixtures of symmetric distributions , 2007, 0708.0499.

[47]  Christian P. Robert,et al.  Large-scale inference , 2010 .

[48]  Chong Gu Smoothing Spline Density Estimation: A Dimensionless Automatic Algorithm , 1993 .

[49]  D. Maraganore,et al.  A Genomic Pathway Approach to a Complex Disease: Axon Guidance and Parkinson Disease , 2007, PLoS genetics.

[50]  B. Efron Size, power and false discovery rates , 2007, 0710.2245.

[51]  Étienne Roquain,et al.  False discovery rate control with unknown null distribution: Is it possible to mimic the oracle? , 2022, The Annals of Statistics.

[52]  A. Bowman,et al.  A look at some data on the old faithful geyser , 1990 .

[53]  Arsalane Chouaib Guidoum,et al.  Kernel Estimator and Bandwidth Selection for Density and its Derivatives The kedd Package Version 1 . 0 . 3 by Arsalane , 2020, 2012.06102.

[54]  M. C. Jones,et al.  Locally parametric nonparametric density estimation , 1996 .

[55]  Sylvain Arlot,et al.  A survey of cross-validation procedures for model selection , 2009, 0907.4728.

[56]  L. Wasserman,et al.  Operating characteristics and extensions of the false discovery rate procedure , 2002 .

[57]  Rohana J. Karunamuni,et al.  A generalized reflection method of boundary correction in kernel density estimation , 2005 .

[58]  Bodhisattva Sen,et al.  Estimation of a two‐component mixture model with applications to multiple testing , 2012, 1204.5488.

[59]  R. Nickl,et al.  CONFIDENCE BANDS IN DENSITY ESTIMATION , 2010, 1002.4801.

[60]  E. Dong,et al.  An interactive web-based dashboard to track COVID-19 in real time , 2020, The Lancet Infectious Diseases.

[61]  E. Arias-Castro,et al.  An EM algorithm for fitting a mixture model with symmetric log-concave densities , 2020, Communications in Statistics - Theory and Methods.

[62]  Frederick R. Forst,et al.  On robust estimation of the location parameter , 1980 .

[63]  C. J. Stone,et al.  An Asymptotically Optimal Window Selection Rule for Kernel Density Estimates , 1984 .

[64]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[65]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[66]  R. Samworth Recent Progress in Log-Concave Density Estimation , 2017, Statistical Science.

[67]  John M. MacDonald,et al.  Doubly Robust Internal Benchmarking and False Discovery Rates for Detecting Racial Bias in Police Stops , 2009 .

[68]  Jiashun Jin,et al.  Optimal rates of convergence for estimating the null density and proportion of nonnull effects in large-scale multiple testing , 2010, 1001.1609.

[69]  M. Rudemo Empirical Choice of Histograms and Kernel Density Estimators , 1982 .