Nonparametric Finite Mixture Models with Possible Shape Constraints: A Cubic Newton Approach

We explore computational aspects of maximum likelihood estimation of the mixture proportions of a nonparametric finite mixture model—a convex optimization problem with old roots in statistics and a key member of the modern data analysis toolkit. Motivated by problems in shape constrained inference, we consider structured variants of this problem with additional convex polyhedral constraints. We propose a new cubic regularized Newton method for this problem and present novel worst-case and local computational guarantees for our algorithm. We extend earlier work by Nesterov and Polyak to the case of a self-concordant objective with polyhedral constraints, such as the ones considered herein. We propose a Frank-Wolfe method to solve the cubic regularized Newton subproblem; and derive efficient solutions for the linear optimization oracles that may be of independent interest. In the particular case of Gaussian mixtures without shape constraints, we derive bounds on how well the finite mixture problem approximates the infinite-dimensional Kiefer-Wolfowitz maximum likelihood estimator. Experiments on synthetic and real datasets suggest that our proposed algorithms exhibit improved runtimes and scalability features over existing benchmarks.

[1]  Bodhisattva Sen,et al.  Editorial: Special Issue on “Nonparametric Inference Under Shape Constraints” , 2018, Statistical Science.

[2]  Mihai Anitescu,et al.  A Fast Algorithm for Maximum Likelihood Estimation of Mixture Proportions Using Sequential Quadratic Programming , 2018, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[3]  Roger Koenker,et al.  Rebayes: an R package for empirical bayes mixture methods , 2017 .

[4]  Martin Jaggi,et al.  On the Global Linear Convergence of Frank-Wolfe Optimization Variants , 2015, NIPS.

[5]  I. Johnstone,et al.  Needles and straw in haystacks: Empirical Bayes estimates of possibly sparse sequences , 2004, math/0410088.

[6]  R. Olshen,et al.  Proceedings of the Berkeley conference in honor of Jerzy Neyman and Jack Kiefer , 1985 .

[7]  P. Dvurechensky,et al.  Self-concordant analysis of Frank-Wolfe algorithms , 2020, ICML.

[8]  Lawrence D. Brown,et al.  NONPARAMETRIC EMPIRICAL BAYES AND COMPOUND DECISION APPROACHES TO ESTIMATION OF A HIGH-DIMENSIONAL VECTOR OF NORMAL MEANS , 2009, 0908.1712.

[9]  Volkan Cevher,et al.  Composite self-concordant minimization , 2013, J. Mach. Learn. Res..

[10]  Xiaosheng Mu,et al.  Log-concavity of a mixture of beta distributions☆ , 2013, 1312.2166.

[11]  J. Kiefer,et al.  CONSISTENCY OF THE MAXIMUM LIKELIHOOD ESTIMATOR IN THE PRESENCE OF INFINITELY MANY INCIDENTAL PARAMETERS , 1956 .

[12]  R. Koenker,et al.  CONVEX OPTIMIZATION, SHAPE CONSTRAINTS, COMPOUND DECISIONS, AND EMPIRICAL BAYES RULES , 2013 .

[13]  Quoc Tran-Dinh,et al.  Generalized self-concordant functions: a recipe for Newton-type methods , 2017, Mathematical Programming.

[14]  Philip Wolfe,et al.  An algorithm for quadratic programming , 1956 .

[15]  U. Grenander On the theory of mortality measurement , 1956 .

[16]  J. Wellner,et al.  The Support Reduction Algorithm for Computing Non‐Parametric Function Estimates in Mixture Models , 2008, Scandinavian journal of statistics, theory and applications.

[17]  R. Samworth Recent Progress in Log-Concave Density Estimation , 2017, Statistical Science.

[18]  J. Wellner,et al.  Estimation of a convex function: characterizations and asymptotic theory. , 2001 .

[19]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[20]  B. Lindsay The Geometry of Mixture Likelihoods: A General Theory , 1983 .

[21]  G. Phillips Interpolation and Approximation by Polynomials , 2003 .

[22]  Paul Grigas,et al.  An Extended Frank-Wolfe Method with "In-Face" Directions, and Its Application to Low-Rank Matrix Completion , 2015, SIAM J. Optim..

[23]  Wenhua Jiang,et al.  General maximum likelihood empirical Bayes estimation of normal means , 2009, 0908.1709.

[24]  Martin Jaggi,et al.  Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization , 2013, ICML.

[25]  Knud D. Andersen,et al.  The Mosek Interior Point Optimizer for Linear Programming: An Implementation of the Homogeneous Algorithm , 2000 .

[26]  Patrice Marcotte,et al.  Some comments on Wolfe's ‘away step’ , 1986, Math. Program..

[27]  Juan Manuel Peña,et al.  Shape preserving representations and optimality of the Bernstein basis , 1993, Adv. Comput. Math..

[28]  Yurii Nesterov,et al.  Interior-point polynomial algorithms in convex programming , 1994, Siam studies in applied mathematics.

[29]  Yurii Nesterov,et al.  Cubic regularization of Newton method and its global performance , 2006, Math. Program..

[30]  Quoc Tran-Dinh,et al.  A Newton Frank–Wolfe method for constrained self-concordant minimization , 2020, Journal of Global Optimization.

[31]  Yogendra P. Chaubey,et al.  Application of Bernstein Polynomials for smooth estimation of a distribution and density function , 2002 .

[32]  Geurt Jongbloed,et al.  Nonparametric Estimation under Shape Constraints , 2014 .