论文信息 - A Multiple Cause Mixture Model for Unsupervised Learning

A Multiple Cause Mixture Model for Unsupervised Learning

This paper presents a formulation for unsupervised learning of clusters reflecting multiple causal structure in binary data. Unlike the hard k-means clustering algorithm and the soft mixture model, each of which assumes that a single hidden event generates each data point, a multiple cause model accounts for observed data by combining assertions from many hidden causes, each of which can pertain to varying degree to any subset of the observable dimensions. We employ an objective function and iterative gradient descent learning algorithm resembling the conventional mixture model. A crucial issue is the mixing function for combining beliefs from different cluster centers in order to generate data predictions whose errors are minimized both during recognition and learning. The mixing function constitutes a prior assumption about underlying structural regularities of the data domain; we demonstrate a weakness inherent to the popular weighted sum followed by sigmoid squashing, and offer alternative forms of the nonlinearity for two types of data domain. Results are presented demonstrating the algorithm's ability successfully to discover coherent multiple causal representations in several experimental data sets.

Eric Saund | E. Saund

[1] Richard O. Duda,et al. Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[2] John E. Warnock,et al. A device independent graphics imaging model for use with raster devices , 1982, SIGGRAPH.

[3] Geoffrey E. Hinton,et al. A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..

[4] Terence D. Sanger,et al. An Optimality Principle for Unsupervised Learning , 1988, NIPS.

[5] Steven J. Nowlan,et al. Maximum Likelihood Competitive Learning , 1989, NIPS.

[6] David Haussler,et al. Unsupervised learning of distributions on binary vectors using two layer networks , 1991, NIPS 1991.

[7] J. Urgen Schmidhuber,et al. Learning Factorial Codes by Predictability Minimization , 1992, Neural Computation.

[8] Volker Tresp,et al. Some Solutions to the Missing Feature Problem in Vision , 1992, NIPS.

[9] Geoffrey E. Hinton,et al. Autoencoders, Minimum Description Length and Helmholtz Free Energy , 1993, NIPS.

[10] R. Zemel. A minimum description length framework for unsupervised learning , 1994 .

[11] Geoffrey E. Hinton,et al. The Helmholtz Machine , 1995, Neural Computation.

[12] Geoffrey E. Hinton,et al. Varieties of Helmholtz Machine , 1996, Neural Networks.

[13] Rajesh P. N. Rao,et al. Dynamic Model of Visual Recognition Predicts Neural Response Properties in the Visual Cortex , 1997, Neural Computation.