Self-organizing mixture networks for probability density estimation

A self-organizing mixture network (SOMN) is derived for learning arbitrary density functions. The network minimizes the Kullback-Leibler information metric by means of stochastic approximation methods. The density functions are modeled as mixtures of parametric distributions. A mixture needs not to be homogenous, i.e., it can have different density profiles. The first layer of the network is similar to Kohonen's self-organizing map (SOM), but with the parameters of the component densities as the learning weights. The winning mechanism is based on maximum posterior probability, and updating of the weights is limited to a small neighborhood around the winner. The second layer accumulates the responses of these local nodes, weighted by the learned mixing parameters. The network possesses a simple structure and computational form, yet yields fast and robust convergence. The network has a generalization ability due to the relative entropy criterion used. Applications to density profile estimation and pattern classification are presented. The SOMN can also provide an insight to the role of neighborhood function used in the SOM.

[1]  Halbert White,et al.  Learning in Artificial Neural Networks: A Statistical Perspective , 1989, Neural Computation.

[2]  Akio Utsugi Hyperparameter Selection for Self-Organizing Maps , 1997, Neural Computation.

[3]  A. F. Smith,et al.  Statistical analysis of finite mixture distributions , 1986 .

[4]  Hujun Yin,et al.  Comparison of a Bayesian SOM with the EM algorithm for Gaussian mixtures , 1997 .

[5]  Konstantin V. Baev Learning in Artificial and Biological Neural Networks , 1998 .

[6]  R. Redner,et al.  Mixture densities, maximum likelihood, and the EM algorithm , 1984 .

[7]  Hujun Yin,et al.  Bayesian self-organising map for Gaussian mixtures , 2001 .

[8]  A. F. Smith,et al.  Statistical analysis of finite mixture distributions , 1986 .

[9]  Nigel M. Allinson,et al.  Dynamic analysis of capillary electrophoresis data using real-time neural networks , 1999 .

[10]  Michael I. Jordan,et al.  Convergence results for the EM approach to mixtures of experts architectures , 1995, Neural Networks.

[11]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[12]  Michael I. Jordan,et al.  On Convergence Properties of the EM Algorithm for Gaussian Mixtures , 1996, Neural Computation.

[13]  Donald B. Rubin,et al.  Max-imum Likelihood from Incomplete Data , 1972 .

[14]  Richard Szeliski,et al.  An Analysis of the Elastic Net Approach to the Traveling Salesman Problem , 1989, Neural Computation.

[15]  B. Silverman Density estimation for statistics and data analysis , 1986 .

[16]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[17]  Nigel M. Allinson,et al.  Bayesian learning for self-organising maps , 1997 .

[18]  Volker Tresp,et al.  Averaging, maximum penalized likelihood and Bayesian estimation for improving Gaussian mixture probability density estimates , 1998, IEEE Trans. Neural Networks.

[19]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[20]  Christopher M. Bishop,et al.  GTM: The Generative Topographic Mapping , 1998, Neural Computation.

[21]  Sun-Yuan Kung,et al.  Quantification and segmentation of brain tissues from MR images: a probabilistic neural network approach , 1998, IEEE Trans. Image Process..

[22]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .