论文信息 - A variational EM algorithm for large databases

A variational EM algorithm for large databases

The EM algorithm is one of the most popular statistical learning algorithms. It is a method for parameter estimation in various problems involving missing data. However, it is a batch learning method and often requires significant computational resources. So we need to develop more elaborate methods to adapt the databases with a large number of records or large dimensionality. In this paper, we present an algorithm which significantly reduces the intensity of computation. The algorithm is based on partial E-steps which has the standard convergence guarantee of EM. It is a version of the incremental EM algorithm which cycles through data cases in blocks. We confirm that the algorithm can reduce computational costs evidently through its application to large databases.

Hao Huang | Le-Peng Bi | Han-Tao Song | Yu-Chang Lu

[1] Nikos A. Vlassis,et al. A variational (E)(M) algorithm for large-scale mixture modeling , 2003 .

[2] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[3] Andrew W. Moore,et al. Very Fast EM-Based Mixture Model Clustering Using Multiresolution Kd-Trees , 1998, NIPS.

[4] Jeff A. Bilmes,et al. A gentle tutorial of the em algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models , 1998 .

[5] Pedro Larrañaga,et al. Structure Learning of Bayesian Networks by Genetic Algorithms: A Performance Analysis of Control Parameters , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[6] D.M. Mount,et al. An Efficient k-Means Clustering Algorithm: Analysis and Implementation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[7] Bo Thiesson,et al. Accelerating EM for Large Databases , 2001, Machine Learning.

[8] Geoffrey E. Hinton,et al. A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.