Maximum likelihood estimation of a multi‐dimensional log‐concave density

Summary.  Let X1,…,Xn be independent and identically distributed random vectors with a (Lebesgue) density f. We first prove that, with probability 1, there is a unique log‐concave maximum likelihood estimator of f. The use of this estimator is attractive because, unlike kernel density estimation, the method is fully automatic, with no smoothing parameters to choose. Although the existence proof is non‐constructive, we can reformulate the issue of computing in terms of a non‐differentiable convex optimization problem, and thus combine techniques of computational geometry with Shor's r‐algorithm to produce a sequence that converges to . An R version of the algorithm is available in the package LogConcDEAD—log‐concave density estimation in arbitrary dimensions. We demonstrate that the estimator has attractive theoretical properties both when the true density is log‐concave and when this model is misspecified. For the moderate or large sample sizes in our simulations, is shown to have smaller mean integrated squared error compared with kernel‐based methods, even when we allow the use of a theoretical, optimal fixed bandwidth for the kernel estimator that would not be available in practice. We also present a real data clustering example, which shows that our methodology can be used in conjunction with the expectation–maximization algorithm to fit finite mixtures of log‐concave densities.

[1]  J. William Ahwood,et al.  CLASSIFICATION , 1931, Foundations of Familiar Language.

[2]  M. Rosenblatt Remarks on Some Nonparametric Estimates of a Density Function , 1956 .

[3]  I. Ibragimov,et al.  On the Composition of Unimodal Distributions , 1956 .

[4]  U. Grenander On the theory of mortality measurement , 1956 .

[5]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[6]  D. Freedman On the Asymptotic Behavior of Bayes' Estimates in the Discrete Case , 1963 .

[7]  P. Stein A Note on the Volume of a Simplex , 1966 .

[8]  S. Yakowitz,et al.  On the Identifiability of Finite Mixtures , 1968 .

[9]  B. Rao Estimation for distributions with monotone failure rate , 1970 .

[10]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[11]  C. Borell Convex set functions ind-space , 1975 .

[12]  E. Bronshtein ε-Entropy of convex sets and functions , 1976 .

[13]  丸山 徹 Convex Analysisの二,三の進展について , 1977 .

[14]  L. Breiman,et al.  Variable Kernel Estimates of Multivariate Densities , 1977 .

[15]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[16]  P. Deheuvels Estimation non paramétrique de la densité par histogrammes généralisés , 1977 .

[17]  J. F. C. Kingman,et al.  Information and Exponential Families in Statistical Theory , 1980 .

[18]  B. Silverman,et al.  Using Kernel Density Estimates to Investigate Multimodality , 1981 .

[19]  David J. Hand,et al.  Discrimination and Classification , 1982 .

[20]  D. Rubin The Bayesian Bootstrap , 1981 .

[21]  Ian Abramson On Bandwidth Variation in Kernel Estimates-A Square Root Law , 1982 .

[22]  K. Fukunaga,et al.  Nonparametric Discriminant Analysis , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  P. Groeneboom Estimating a monotone density , 1984 .

[24]  P. Groeneboom Brownian motion with a parabolic drift and airy functions , 1989 .

[25]  L. Devroye,et al.  Nonparametric Density Estimation: The L 1 View. , 1985 .

[26]  B. Silverman Density estimation for statistics and data analysis , 1986 .

[27]  A. F. Smith,et al.  Statistical analysis of finite mixture distributions , 1986 .

[28]  O. E. Barndorff-Nielsen,et al.  Infereni on full or partial parameters based on the standardized signed log likelihood ratio , 1986 .

[29]  L. Devroye,et al.  Nonparametric density estimation : the L[1] view , 1987 .

[30]  Geoffrey J. McLachlan,et al.  Mixture models : inference and applications to clustering , 1989 .

[31]  J. L. Hodges,et al.  Discriminatory Analysis - Nonparametric Discrimination: Consistency Properties , 1989 .

[32]  B. Yandell Spline smoothing and nonparametric regression , 1989 .

[33]  M. C. Jones,et al.  Spline Smoothing and Nonparametric Regression. , 1989 .

[34]  W. Chan,et al.  Unimodality, convexity, and applications , 1989 .

[35]  Lennart Bondesson,et al.  Generalized Gamma convolutions and complete monotonicity , 1990 .

[36]  G. Wahba Spline models for observational data , 1990 .

[37]  D. Pollard,et al.  Cube Root Asymptotics , 1990 .

[38]  Andrew Caplin,et al.  Aggregation and Social Choice: A Mean Voter Theorem , 1991 .

[39]  M. C. Jones,et al.  A reliable data-based bandwidth selection method for kernel density estimation , 1991 .

[40]  Andrew Caplin,et al.  Aggregation and Imperfect Competition: On the Existence of Equilibrium , 1991 .

[41]  David W. Scott,et al.  Multivariate Density Estimation: Theory, Practice, and Visualization , 1992, Wiley Series in Probability and Statistics.

[42]  Shean-Tsong Chiu An automatic bandwidth selector for kernel density estimation , 1992 .

[43]  J. Marron,et al.  Smoothed cross-validation , 1992 .

[44]  J. Wellner,et al.  Information Bounds and Nonparametric Maximum Likelihood Estimation , 1992 .

[45]  Lennart Bondesson,et al.  Generalized Gamma Convolutions and Related Classes of Distributions and Densities , 1992 .

[46]  P. Massart,et al.  Rates of convergence for minimum contrast estimators , 1993 .

[47]  Olvi L. Mangasarian,et al.  Nuclear feature extraction for breast tumor diagnosis , 1993, Electronic Imaging.

[48]  H. Bozdogan Choosing the Number of Clusters, Subset Selection of Variables, and Outlier Detection in the Standard Mixture-Model Cluster Analysis , 1994 .

[49]  Matthew P. Wand,et al.  Kernel Smoothing , 1995 .

[50]  M. Escobar,et al.  Bayesian Density Estimation and Inference Using Mixtures , 1995 .

[51]  R. Tweedie,et al.  Rates of convergence of the Hastings and Metropolis algorithms , 1996 .

[52]  Rob J Hyndman,et al.  Computing and Graphing Highest Density Regions , 1996 .

[53]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[54]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[55]  D. W. Scott,et al.  On Locally Adaptive Density Estimation , 1996 .

[56]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[57]  I. Johnstone,et al.  Density estimation by wavelet thresholding , 1996 .

[58]  David P. Dobkin,et al.  The quickhull algorithm for convex hulls , 1996, TOMS.

[59]  Gert Vegter,et al.  In handbook of discrete and computational geometry , 1997 .

[60]  Stephen J. Roberts,et al.  Parametric and non-parametric unsupervised cluster analysis , 1997, Pattern Recognit..

[61]  S. Brooks MCMC convergence diagnosis via multivariate bounds on log-concave densities , 1998 .

[62]  Geurt Jongbloed,et al.  The Iterative Convex Minorant Algorithm for Nonparametric Estimation , 1998 .

[63]  Mark Yuying An,et al.  Logconcavity versus Logconvexity: A Complete Characterization , 1998 .

[64]  Hendrik P. Lopuhaä,et al.  Asymptotic normality of the $L_1$ error of the Grenander estimator , 1999 .

[65]  Sayan Mukherjee,et al.  Support Vector Method for Multivariate Density Estimation , 1999, NIPS.

[66]  Brett Presnell,et al.  Biased Bootstrap Methods for Reducing the Effects of Contamination , 1999 .

[67]  Franz Kappel,et al.  An Implementation of Shor's r-Algorithm , 2000, Comput. Optim. Appl..

[68]  J. Wellner,et al.  Estimation of a convex function: characterizations and asymptotic theory. , 2001 .

[69]  W. John Braun,et al.  Data Sharpening for Nonparametric Inference Subject to Constraints , 2001 .

[70]  P. Eggermont,et al.  Maximum penalized likelihood estimation , 2001 .

[71]  Herbert Lee,et al.  Bagging and the Bayesian Bootstrap , 2001, AISTATS.

[72]  Adrian E. Raftery,et al.  Model-Based Clustering, Discriminant Analysis, and Density Estimation , 2002 .

[73]  D. Balding,et al.  Approximate Bayesian computation in population genetics. , 2002, Genetics.

[74]  G. Walther Detecting the Presence of Mixing with Multiscale Maximum Likelihood , 2002 .

[75]  W. Johnson,et al.  Modeling Regression Error With a Mixture of Polya Trees , 2002 .

[76]  S. Sain Multivariate locally adaptive density estimation , 2002 .

[77]  Alan F Karr,et al.  Maximum Penalized Likelihood Estimation, Vol. I: Density Estimation , 2003 .

[78]  Xiao-Hua Zhou,et al.  NONPARAMETRIC ESTIMATION OF COMPONENT DISTRIBUTIONS IN A MULTIVARIATE MIXTURE , 2003 .

[79]  M. Hazelton,et al.  Plug-in bandwidth matrices for bivariate kernel density estimation , 2003 .

[80]  Leo Breiman,et al.  Randomizing Outputs to Increase Prediction Accuracy , 2000, Machine Learning.

[81]  Vladimir N. Kulikov,et al.  Asymptotic normality of the Lk-error of the Grenander estimator , 2006, math/0602244.

[82]  Rob J Hyndman,et al.  Bandwidth Selection for Multivariate Kernel Density Estimation Using MCMC , 2004 .

[83]  S. Walker New approaches to Bayesian consistency , 2004, math/0503672.

[84]  Carl W. Lee,et al.  Subdivisions and Triangulationsof Polytopes , 2004, Handbook of Discrete and Computational Geometry, 2nd Ed..

[85]  Stephan R. Sain,et al.  Multi-dimensional Density Estimation , 2004 .

[86]  M. Bagnoli,et al.  Log-concave probability and its applications , 2004 .

[87]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[88]  Asymptotic normality of the L1- error for Geffroy's estimate of Poisson point process boundaries , 2005 .

[89]  M. Hazelton,et al.  Convergence rates for unconstrained bandwidth matrix selectors in multivariate kernal density estimation , 2005 .

[90]  Jacek Koronacki,et al.  Multivariate density estimation: A comparative study , 1997, Neural Computing & Applications.

[91]  Tarn Dunong Bandwidth selectors for multivariate kernel density estimation , 2005, Bulletin of the Australian Mathematical Society.

[92]  B. Silverman,et al.  Maximum Penalized Likelihood Estimation , 2006 .

[93]  L. Bordes,et al.  SEMIPARAMETRIC ESTIMATION OF A TWO-COMPONENT MIXTURE MODEL , 2006, math/0607812.

[94]  Laurent Bordes,et al.  Semiparametric Estimation of a Two-component Mixture Model where One Component is known , 2006 .

[95]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[96]  Jayanta Kumar Pal,et al.  Estimating a Polya Frequency Function , 2006 .

[97]  Bernhard Pfahringer,et al.  Improving on Bagging with Input Smearing , 2006, PAKDD.

[98]  Hajo Holzmann,et al.  Identifiability of Finite Mixtures of Elliptical Distributions , 2006 .

[99]  Rob J. Hyndman,et al.  A Bayesian approach to bandwidth selection for multivariate kernel density estimation , 2006, Comput. Stat. Data Anal..

[100]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[101]  Yi Lin,et al.  AN EFFECTIVE METHOD FOR HIGH-DIMENSIONAL LOG-DENSITY ANOVA ESTIMATION, WITH APPLICATION TO NONPARAMETRIC GRAPHICAL MODEL BUILDING , 2006 .

[102]  Guenther Walther,et al.  Clustering with mixtures of log-concave distributions , 2007, Comput. Stat. Data Anal..

[103]  K. Rufibach Computing maximum likelihood estimators of a log-concave density function , 2007 .

[104]  Jayanta Kumar Pal,et al.  Estimating a Polya frequency function$_2$ , 2007, 0708.1064.

[105]  D. Hunter,et al.  Inference for mixtures of symmetric distributions , 2007, 0708.0499.

[106]  Kaspar Rufibach,et al.  Active Set and EM Algorithms for Log-Concave Densities Based on Complete and Censored Data , 2007, 0707.4643.

[107]  A. G. Nogales,et al.  A note on kernel density estimation at a parametric rate , 2007, 1111.4542.

[108]  L. Duembgen,et al.  Maximum likelihood estimation of a log-concave density and its distribution function: Basic properties and uniform consistency , 2007, 0709.0334.

[109]  Yong Wang On fast computation of the non‐parametric maximum likelihood estimate of a mixing distribution , 2007 .

[110]  Bohyung Han,et al.  Sequential Kernel Density Approximation and Its Application to Real-Time Visual Tracking , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[111]  J. Wellner,et al.  The Support Reduction Algorithm for Computing Non‐Parametric Function Estimates in Mixture Models , 2008, Scandinavian journal of statistics, theory and applications.

[112]  Axel Munk,et al.  P-values for classification , 2008, 0801.2934.

[113]  B. Park,et al.  Choice of neighbor order in nearest-neighbor classification , 2008, 0810.5276.

[114]  S. Geer,et al.  Multivariate log-concave distributions as a nearly parametric model , 2008, Am. Math. Mon..

[115]  José E. Chacón,et al.  Data‐driven choice of the smoothing parametrization for kernel density estimators , 2009 .

[116]  Martin L. Hazelton,et al.  Linear boundary kernels for bivariate density estimation , 2009 .

[117]  Ricardo Fraiman,et al.  Nonparametric likelihood based estimation for a multivariate Lipschitz density , 2009, J. Multivar. Anal..

[118]  Lutz Duembgen,et al.  On an Auxiliary Function for Log-Density Estimation , 2008, 0807.4719.

[119]  Robert B. Gramacy,et al.  Maximum likelihood estimation of a multivariate log-concave density , 2010 .

[120]  C. Matias,et al.  Identifiability of parameters in latent structure models with many observed variables , 2008, 0809.5032.

[121]  J. Wellner,et al.  Limit Distribution Theory for Maximum Likelihood Estimation of a Log-Concave Density. , 2007, Annals of statistics.

[122]  G. Walther Inference and Modeling with Log-concave Distributions , 2009, 1010.0305.

[123]  O. Cappé,et al.  On‐line expectation–maximization algorithm for latent data models , 2009 .

[124]  M. Cule,et al.  Theoretical properties of the log-concave maximum likelihood estimator of a multidimensional density , 2009, 0908.4400.

[125]  Madeleine Cule,et al.  Maximum likelihood estimation of a multivariate log-concave density , 2010 .

[126]  Peter Dalgaard,et al.  R Development Core Team (2010): R: A language and environment for statistical computing , 2010 .

[127]  Discussion of the paper by Cule, Samworth and Stewart: \Maximum likelihood estimation of a multidimensional log-concave density" , 2010 .

[128]  R. Koenker,et al.  QUASI-CONCAVE DENSITY ESTIMATION , 2010, 1007.4013.

[129]  L. Dümbgen,et al.  Consistency of multivariate log-concave density estimators , 2010 .

[130]  T. Duong,et al.  Multivariate plug-in bandwidth selection with unconstrained pilot bandwidth matrices , 2010 .

[131]  P. Müller,et al.  Bayesian Nonparametrics: An invitation to Bayesian nonparametrics , 2010 .

[132]  Jon A Wellner,et al.  NONPARAMETRIC ESTIMATION OF MULTIVARIATE CONVEX-TRANSFORMED DENSITIES. , 2009, Annals of statistics.

[133]  E. Seijo,et al.  Nonparametric Least Squares Estimation of a Multivariate Convex Regression Function , 2010, 1003.4765.