Single Factor Analysis in MML Mixture Modelling

Mixture modelling concerns the unsupervised discovery of clusters within data. Most current clustering algorithms assume that variables within classes are uncorrelated. We present a method for producing and evaluating models which account for inter-attribute correlation within classes with a single Gaussian linear factor. The method used is Minimum Message Length (MML), an invariant, information-theoretic Bayesian hypothesis evaluation criterion. Our work extends and unifies that of Wallace and Boulton (1968) and Wallace and Freeman (1992), concerned respectively with MML mixture modelling and MML single factor analysis. Results on simulated data are comparable to those of Wallace and Freeman (1992), outperforming Maximum Likelihood. We include an application of mixture modelling with single factors on spectral data from the Infrared Astronomical Satellite. Our model shows fewer unnecessary classes than that produced by AutoClass (Goebel et. al. 1989) due to the use of factors in modelling correlation.

[1]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[2]  Geoffrey E. Hinton,et al.  Autoencoders, Minimum Description Length and Helmholtz Free Energy , 1993, NIPS.

[3]  David L. Dowe,et al.  Point Estimation Using the Kullback-Leibler Loss Function and MML , 1998, PAKDD.

[4]  C. S. Wallace,et al.  MML mixture modelling of multi-state, Poisson, von Mises circular and Gaussian distributions , 1997 .

[5]  Matthew Self,et al.  Bayesian Classification , 1988, AAAI.

[6]  Geoffrey E. Hinton,et al.  Modeling the manifolds of images of handwritten digits , 1997, IEEE Trans. Neural Networks.

[7]  C. S. Wallace,et al.  Circular clustering of protein dihedral angles by Minimum Message Length. , 1996, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[8]  H. Aumann,et al.  IRAS catalogues and atlases. Atlas of low-resolution spectra. , 1986 .

[9]  A. F. Smith,et al.  Statistical analysis of finite mixture distributions , 1986 .

[10]  Waldemar W. Koczkodaj,et al.  Advances in Computing and Information — ICCI '90 , 1990, Lecture Notes in Computer Science.

[11]  C. S. Wallace,et al.  Estimation and Inference by Compact Coding , 1987 .

[12]  J. Neyman,et al.  Consistent Estimates Based on Partially Consistent Observations , 1948 .

[13]  C. S. Wallace,et al.  Single-factor analysis by minimum message length estimation , 1992 .

[14]  E. B. Andersen,et al.  Modern factor analysis , 1961 .

[15]  C. S. Wallace,et al.  Bayesian Estimation of the Von Mises Concentration Parameter , 1996 .

[16]  C. Beichman,et al.  Infrared astronomical satellite (IRAS) catalogs and atlases. Volume 1: Explanatory supplement , 1988 .

[17]  J. Stutz,et al.  A Bayesian classification of the IRAS LRS Atlas , 1989 .

[18]  C. S. Wallace,et al.  Classification by Minimum-Message-Length Inference , 1991, ICCI.

[19]  C. S. Wallace,et al.  Resolving the Neyman-Scott problem by minimum message length , 1997 .

[20]  C. S. Wallace,et al.  An Information Measure for Classification , 1968, Comput. J..