On learning statistical mixtures maximizing the complete likelihood The k-MLE methodology using geometric hard clustering

Statistical mixtures are semi-parametric models ubiquitously met in data science since they can universally model smooth densities arbitrarily closely. Finite mixtures are usually inferred from data using the celebrated Expectation-Maximization framework that locally iteratively maximizes the incomplete likelihood by assigning softly data to mixture components. In this paper, we present a novel methodology to infer mixtures by transforming the learning problem into a sequence of geometric center-based hard clustering problems that provably maximizes monotonically the complete likelihood. Our versatile method is fast and uses low memory footprint: The core inner steps can be implemented using various generalized k-means type heuristics. Thus we can leverage recent results on clustering to mixture learning. In particular, for mixtures of singly-parametric distributions including for example the Rayleigh, Weibull, or Poisson distributions, we show how to use dynamic programming to solve exactly the inner geo...

[1]  Grey Giddins,et al.  Statistics , 2016, The Journal of hand surgery, European volume.

[2]  Inderjit S. Dhillon,et al.  Clustering with Bregman Divergences , 2005, J. Mach. Learn. Res..

[3]  Marjorie G. Hahn,et al.  Existence of Maximum Likelihood Estimates for Multi‐dimensional Exponential Families , 1997 .

[4]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[5]  K. Pearson Contributions to the Mathematical Theory of Evolution , 1894 .

[6]  Frank Nielsen,et al.  Bregman vantage point trees for efficient nearest Neighbor Queries , 2009, 2009 IEEE International Conference on Multimedia and Expo.

[7]  Petia Radeva,et al.  Rayleigh Mixture Model for Plaque Characterization in Intravascular Ultrasound , 2011, IEEE Transactions on Biomedical Engineering.

[8]  David M. Mount,et al.  A local search approximation algorithm for k-means clustering , 2002, SCG '02.

[9]  Nir Ailon,et al.  A Tight Lower Bound Instance for k-means++ in Constant Dimension , 2014, TAMC.

[10]  Frank Nielsen,et al.  Fast Learning of Gamma Mixture Models with k-MLE , 2013, SIMBAD.

[11]  Moritz Hardt,et al.  Tight Bounds for Learning a Mixture of Two Gaussians , 2014, STOC.

[12]  Ankur Moitra,et al.  Settling the Polynomial Learnability of Mixtures of Gaussians , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[13]  E. Forgy,et al.  Cluster analysis of multivariate data : efficiency versus interpretability of classifications , 1965 .

[14]  Moni Naor,et al.  Theory and Applications of Models of Computation , 2015, Lecture Notes in Computer Science.

[15]  Frank Nielsen,et al.  Generalized Bhattacharyya and Chernoff upper bounds on Bayes error using quasi-arithmetic means , 2014, Pattern Recognit. Lett..

[16]  Dan Feldman,et al.  A PTAS for k-means clustering based on weak coresets , 2007, SCG '07.

[17]  Frank Nielsen,et al.  Tailored Bregman Ball Trees for Effective Nearest Neighbors , 2009 .

[18]  Frank Nielsen,et al.  Model centroids for the simplification of Kernel Density estimators , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19]  Vincent Kanade,et al.  Clustering Algorithms , 2021, Wireless RF Energy Transfer in the Massive IoT Era.

[20]  Frank Nielsen,et al.  Statistical exponential families: A digest with flash cards , 2009, ArXiv.

[21]  Frank Nielsen,et al.  Hartigan’s Method for k-MLE: Mixture Modeling with Wishart Distributions and Its Application to Motion Retrieval , 2014 .

[22]  Andreas Krause,et al.  Scalable Training of Mixture Models via Coresets , 2011, NIPS.

[23]  Weixin Xie,et al.  An Efficient Global K-means Clustering Algorithm , 2011, J. Comput..

[24]  Frank Nielsen,et al.  K-MLE: A fast algorithm for learning statistical mixture models , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[25]  Marc Teboulle,et al.  A Unified Continuous Optimization Framework for Center-Based Clustering Methods , 2007, J. Mach. Learn. Res..

[26]  Frank Nielsen,et al.  k-MLE for mixtures of generalized Gaussians , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[27]  Frank Nielsen,et al.  An Information-Geometric Characterization of Chernoff Information , 2013, IEEE Signal Processing Letters.

[28]  R. Pearl Biometrics , 1914, The American Naturalist.

[29]  L S Penrose,et al.  Hereditary genius. , 1951, The Eugenics review.

[30]  Malgorzata Bogdan,et al.  On Existence of Maximum Likelihood Estimators in Exponential Families , 2000 .

[31]  Jiahua Chen Optimal Rate of Convergence for Finite Mixture Models , 1995 .

[32]  Adam Tauman Kalai,et al.  Disentangling Gaussians , 2012, Commun. ACM.

[33]  G. Celeux,et al.  A Classification EM algorithm for clustering and two stochastic versions , 1992 .

[34]  Günter Rote,et al.  Add isotropic Gaussian kernels at own risk: more and more resilient modes in higher dimensions , 2012, SoCG '12.

[35]  Moritz Hardt,et al.  Sharp bounds for learning a mixture of two gaussians , 2014, ArXiv.

[36]  L. Cobb,et al.  Estimation and Moment Recursion Relations for Multimodal Distributions of the Exponential Family , 1983 .

[37]  Frank Nielsen,et al.  Simplification and hierarchical representations of mixtures of exponential families , 2010 .

[38]  Richard Nock,et al.  On Bregman Voronoi diagrams , 2007, SODA '07.

[39]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[40]  Frank Nielsen,et al.  Optimal Interval Clustering: Application to Bregman Clustering and Statistical Mixture Learning , 2014, IEEE Signal Processing Letters.

[41]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .