Mixture of Experts Classification Using a Hierarchical Mixture Model

A three-level hierarchical mixture model for classification is presented that models the following data generation process: (1) the data are generated by a finite number of sources (clusters), and (2) the generation mechanism of each source assumes the existence of individual internal class-labeled sources (subclusters of the external cluster). The model estimates the posterior probability of class membership similar to a mixture of experts classifier. In order to learn the parameters of the model, we have developed a general training approach based on maximum likelihood that results in two efficient training algorithms. Compared to other classification mixture models, the proposed hierarchical model exhibits several advantages and provides improved classification performance as indicated by the experimental results.

[1]  Aristidis Likas,et al.  Shared kernel models for class conditional density estimation , 2001, IEEE Trans. Neural Networks.

[2]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[3]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[4]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[5]  Peter E. Hart,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[6]  Nikos A. Vlassis,et al.  A Greedy EM Algorithm for Gaussian Mixture Learning , 2002, Neural Processing Letters.

[7]  Brian Everitt,et al.  An Introduction to Latent Variable Models , 1984 .

[8]  R. Tibshirani,et al.  Discriminant Analysis by Gaussian Mixtures , 1996 .

[9]  New York Dover,et al.  ON THE CONVERGENCE PROPERTIES OF THE EM ALGORITHM , 1983 .

[10]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[11]  Christopher M. Bishop,et al.  A Hierarchical Latent Variable Model for Data Visualization , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[13]  Michael I. Jordan,et al.  Supervised learning from incomplete data via an EM approach , 1993, NIPS.

[14]  Volker Tresp,et al.  Improved Gaussian Mixture Density Estimates Using Bayesian Penalty Terms and Network Averaging , 1995, NIPS.

[15]  David J. Miller,et al.  A Mixture of Experts Classifier with Learning Based on Both Labelled and Unlabelled Data , 1996, NIPS.

[16]  Geoffrey E. Hinton,et al.  SMEM Algorithm for Mixture Models , 1998, Neural Computation.

[17]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[18]  David J. Miller,et al.  Combined Learning and Use for a Mixture Model Equivalent to the RBF Classifier , 1998, Neural Computation.