Learning from Infinite Data in Finite Time

We propose the following general method for scaling learning algorithms to arbitrarily large data sets. Consider the model Mn→ learned by the algorithm using ni examples in step i (n→ = (n1,..., nm)), and the model M∞ that would be learned using infinite examples. Upper-bound the loss L(Mn→,M,∞) between them as a function of n→, and then minimize the algorithm's time complexity ƒ(n→) subject to the constraint that L(M∞, Mn→) be at most e with probability at most δ. We apply this method to the EM algorithm for mixtures of Gaussians. Preliminary experiments on a series of large data sets provide evidence of the potential of this approach.