A variational EM algorithm for large databases

The EM algorithm is one of the most popular statistical learning algorithms. It is a method for parameter estimation in various problems involving missing data. However, it is a batch learning method and often requires significant computational resources. So we need to develop more elaborate methods to adapt the databases with a large number of records or large dimensionality. In this paper, we present an algorithm which significantly reduces the intensity of computation. The algorithm is based on partial E-steps which has the standard convergence guarantee of EM. It is a version of the incremental EM algorithm which cycles through data cases in blocks. We confirm that the algorithm can reduce computational costs evidently through its application to large databases.