Global Optimization of Finite Mixture Models

Title of dissertation: GLOBAL OPTIMIZATION OF FINITE MIXTURE MODELS Jeffrey W. Heath Doctor of Philosophy, 2007 Dissertation directed by: Professor Michael Fu Robert H. Smith School of Business & Professor Wolfgang Jank Robert H. Smith School of Business The Expectation-Maximization (EM) algorithm is a popular and convenient tool for the estimation of Gaussian mixture models and its natural extension, modelbased clustering. However, while the algorithm is convenient to implement and numerically very stable, it only produces solutions that are locally optimal. Thus, EM may not achieve the globally optimal solution in Gaussian mixture analysis problems, which can have a large number of local optima. This dissertation introduces several new algorithms designed to produce globally optimal solutions for Gaussian mixture models. The building blocks for these algorithms are methods from the operations research literature, namely the Cross-Entropy (CE) method and Model Reference Adaptive Search (MRAS). The new algorithms we propose must efficiently simulate positive definite covariance matrices of the Gaussian mixture components. We propose several new solutions to this problem. One solution is to blend the updating procedure of CE and MRAS with the principles of Expectation-Maximization updating for the covariance matrices, leading to two new algorithms, CE-EM and MRAS-EM. We also propose two additional algorithms, CE-CD and MRAS-CD, which rely on the Cholesky decomposition to construct the random covariance matrices. Numerical experiments illustrate the effectiveness of the proposed algorithms in finding global optima where the classical EM fails to do so. We find that although a single run of the new algorithms may be slower than EM, they have the potential of producing significantly better global solutions to the model-based clustering problem. We also show that the global optimum matters in the sense that it significantly improves the clustering task. Furthermore, we provide a a theoretical proof of global convergence to the optimal solution of the likelihood function of Gaussian mixtures for one of the algorithms, namely MRAS-CD. This offers support that the algorithm is not merely an ad-hoc heuristic, but is systematically designed to produce global solutions to Gaussian mixture models. Finally, we investigate the fitness landscape of Gaussian mixture models and give evidence for why this is a difficult global optimization problem. We discuss different metrics that can be used to evaluate the difficulty of global optimization problems, and then apply them to the context of Gaussian mixture models. GLOBAL OPTIMIZATION OF FINITE MIXTURE MODELS

[1]  R. Jennrich,et al.  Acceleration of the EM Algorithm by using Quasi‐Newton Methods , 1997 .

[2]  Ronald A. Thisted,et al.  Elements of statistical computing , 1986 .

[3]  Jeffrey C. Lagarias,et al.  Convergence Properties of the Nelder-Mead Simplex Method in Low Dimensions , 1998, SIAM J. Optim..

[4]  Naonori Ueda,et al.  Deterministic annealing EM algorithm , 1998, Neural Networks.

[5]  B. Golden,et al.  Interval estimation of a global optimum for large combinatorial problems , 1979 .

[6]  Hsiao-Dong Chiang,et al.  Stability Region Based Expectation Maximization for Model-based Clustering , 2006, Sixth International Conference on Data Mining (ICDM'06).

[7]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[8]  T. Louis Finding the Observed Information Matrix When Using the EM Algorithm , 1982 .

[9]  R. J. Alcock,et al.  Time-Series Similarity Queries Employing a Feature-Based Approach , 1999 .

[10]  David West,et al.  A comparison of SOM neural network and hierarchical clustering methods , 1996 .

[11]  Wolfgang Jank,et al.  Quasi-Monte Carlo sampling to improve the efficiency of Monte Carlo EM , 2004, Comput. Stat. Data Anal..

[12]  M. West,et al.  Practical Bayesian inference using mixtures of mixtures. , 1996, Biometrics.

[13]  Shie Mannor,et al.  A Tutorial on the Cross-Entropy Method , 2005, Ann. Oper. Res..

[14]  Dirk P. Kroese,et al.  Application of the cross-entropy method to clustering and vector quantization , 2007, J. Glob. Optim..

[15]  Wolfgang Jank,et al.  New global optimization algorithms for model-based clustering , 2009, Comput. Stat. Data Anal..

[16]  Xiao-Li Meng,et al.  Maximum likelihood estimation via the ECM algorithm: A general framework , 1993 .

[17]  R. Bhatia Positive Definite Matrices , 2007 .

[18]  G. W. Milligan,et al.  A Review Of Monte Carlo Tests Of Cluster Analysis. , 1981, Multivariate behavioral research.

[19]  D. Rubin,et al.  The ECME algorithm: A simple extension of EM and ECM with faster monotone convergence , 1994 .

[20]  Xiao-Li Meng,et al.  On the rate of convergence of the ECM algorithm , 1994 .

[21]  Brian S. Caffo,et al.  Ascent-Based Monte Carlo EM , 2003 .

[22]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[23]  M. Ball,et al.  Estimating Flight Departure Delay Distributions—A Statistical Approach With Long-Term Trend and Short-Term Pattern , 2008 .

[24]  George Casella,et al.  Implementations of the Monte Carlo EM Algorithm , 2001 .

[25]  Lih-Yuan Deng,et al.  The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation, and Machine Learning , 2006, Technometrics.

[26]  Christophe Biernacki,et al.  Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models , 2003, Comput. Stat. Data Anal..

[27]  Adrian E. Raftery,et al.  How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis , 1998, Comput. J..

[28]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[29]  Douglas M. Bates,et al.  Unconstrained parametrizations for variance-covariance matrices , 1996, Stat. Comput..

[30]  J. Booth,et al.  Maximizing generalized linear mixed model likelihoods with an automated Monte Carlo EM algorithm , 1999 .

[31]  P. Deb Finite Mixture Models , 2008 .

[32]  New York Dover,et al.  ON THE CONVERGENCE PROPERTIES OF THE EM ALGORITHM , 1983 .

[33]  Terry Jones,et al.  Fitness Distance Correlation as a Measure of Problem Difficulty for Genetic Algorithms , 1995, ICGA.

[34]  Dirk P. Kroese,et al.  The Cross-Entropy Method for Continuous Multi-Extremal Optimization , 2006 .

[35]  G. C. Wei,et al.  A Monte Carlo Implementation of the EM Algorithm and the Poor Man's Data Augmentation Algorithms , 1990 .

[36]  M. Fu,et al.  Global Convergence of Model Reference Adaptive Search for Gaussian Mixtures , 2007 .

[37]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[38]  J. F. Price,et al.  On descent from local minima , 1971 .

[39]  G. Celeux,et al.  A Classification EM algorithm for clustering and two stochastic versions , 1992 .

[40]  Aimo A. Törn,et al.  Stochastic Global Optimization: Problem Classes and Solution Techniques , 1999, J. Glob. Optim..

[41]  Dirk P. Kroese,et al.  Global likelihood optimization via the cross-entropy method with an application to mixture models , 2004, Proceedings of the 2004 Winter Simulation Conference, 2004..

[42]  Wolfgang Jank,et al.  The EM Algorithm , Its Stochastic Implementation and Global Optimization : Some Challenges and Opportunities for OR , 2006 .

[43]  Reuven Y. Rubinstein,et al.  Optimization of computer simulation models with rare events , 1997 .

[44]  R. Jennrich,et al.  Conjugate Gradient Acceleration of the EM Algorithm , 1993 .

[45]  Richard A. Levine,et al.  An automated (Markov chain) Monte Carlo EM algorithm , 2004 .

[46]  Jiaqiao Hu,et al.  A Model Reference Adaptive Search Algorithm for Global Optimization , 2005 .