Unsupervised Learning of Correlated Multivariate Gaussian Mixture Models Using MML

Mixture modelling or unsupervised classification is the problem of identifying and modelling components (or clusters, or classes) in a body of data. We consider here the application of the Minimum Message Length (MML) principle to a mixture modelling problem of multivariate Gaussian distributions. Earlier work in MML mixture modelling includes the multinomial, Gaussian, Poisson, von Mises circular, and Student t distributions and in these applications all variables in a component are assumed to be uncorrelated with each other. In this paper, we propose a more general type of MML mixture modelling which allows the variables within a component to be correlated. Two MML approximations are used. These are the Wallace and Freeman (1987) approximation and Dowe’s MMLD approximation (2002). The former is used for calculating the relative abundances (mixing proportions) of each component and the latter is used for estimating the distribution parameters involved in the components of the mixture model. The proposed method is applied to the analysis of two real-world datasets – the well-known (Fisher) Iris and diabetes datasets. The modelling results are then compared with those obtained using two other modelling criteria, AIC and BIC (which is identical to Rissanen’s 1978 MDL), in terms of their probability bit-costings, and show that the proposed MML method performs better than both these criteria. Furthermore, the MML method also infers more closely the three underlying Iris species than both AIC and BIC.

[1]  David L. Dowe,et al.  MML Inference of Decision Graphs with Multi-way Joins and Dynamic Attributes , 2002, Australian Conference on Artificial Intelligence.

[2]  Gregory J. Chaitin,et al.  On the Length of Programs for Computing Finite Binary Sequences , 1966, JACM.

[3]  Lloyd Allison,et al.  Univariate Polynomial Inference by Monte Carlo Message Length Approximation , 2002, ICML.

[4]  Adrian E. Raftery,et al.  How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis , 1998, Comput. J..

[5]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[6]  David L. Dowe,et al.  Minimum Message Length and Kolmogorov Complexity , 1999, Comput. J..

[7]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[8]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[9]  Xindong Wu,et al.  Research and Development in Knowledge Discovery and Data Mining , 1998, Lecture Notes in Computer Science.

[10]  C. S. Wallace,et al.  An Information Measure for Classification , 1968, Comput. J..

[11]  Ray J. Solomonoff,et al.  A Formal Theory of Inductive Inference. Part II , 1964, Inf. Control..

[12]  C. S. Wallace,et al.  Estimation and Inference by Compact Coding , 1987 .

[13]  David L. Dowe,et al.  Clustering of Gaussian and t distributions using Minimum Message Length , 2003 .

[14]  Mitsuru Ishizuka,et al.  PRICAI 2002: Trends in Artificial Intelligence , 2002, Lecture Notes in Computer Science.

[15]  Lloyd Allison,et al.  Change-Point Estimation Using New Minimum Message Length Approximations , 2002, PRICAI.

[16]  Bob McKay,et al.  AI 2002: Advances in Artificial Intelligence , 2002, Lecture Notes in Computer Science.

[17]  S. Sclove Application of model-selection criteria to some problems in multivariate analysis , 1987 .

[18]  C. S. Wallace,et al.  MML mixture modelling of multi-state, Poisson, von Mises circular and Gaussian distributions , 1997 .

[19]  David L. Dowe,et al.  MML Clustering of Continuous-Valued Data Using Gaussian and t Distributions , 2002, Australian Joint Conference on Artificial Intelligence.

[20]  G. Reaven,et al.  An attempt to define the nature of chemical diabetes using a multidimensional analysis , 2004, Diabetologia.

[21]  David L. Dowe,et al.  Refinements of MDL and MML Coding , 1999, Comput. J..

[22]  Murray A. Jorgensen,et al.  Theory & Methods: Mixture model clustering using the MULTIMIX program , 1999 .

[23]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[24]  H. Akaike A new look at the statistical model identification , 1974 .

[25]  David L. Dowe,et al.  MML clustering of multi-state, Poisson, von Mises circular and Gaussian distributions , 2000, Stat. Comput..

[26]  David L. Dowe,et al.  Point Estimation Using the Kullback-Leibler Loss Function and MML , 1998, PAKDD.

[27]  David L. Dowe,et al.  Single Factor Analysis in MML Mixture Modelling , 1998, PAKDD.

[28]  Peter Adams,et al.  The EMMIX software for the fitting of mixtures of normal and t-components , 1999 .

[29]  A. Kolmogorov Three approaches to the quantitative definition of information , 1968 .

[30]  Anil K. Jain,et al.  Unsupervised Learning of Finite Mixture Models , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[31]  Joseph L Schafer,et al.  Analysis of Incomplete Multivariate Data , 1997 .

[32]  Peter C. Cheeseman,et al.  Bayesian Classification (AutoClass): Theory and Results , 1996, Advances in Knowledge Discovery and Data Mining.