Optimal Bayesian Online

In a Bayesian approach to online learning a simple paramet-ric approximate posterior over rules is updated in each online learning step. Predictions on new data are derived from averages over this posterior. This should be compared to the Bayes optimal batch (or ooine) approach for which the posterior is calculated from the prior and the likelihood of the whole training set. We suggest that minimizing the diierence between the batch and the approximate posterior will optimize the performance of the Bayes online algorithm. This general principle is demonstrated for three scenarios: learning a linear perceptron rule and a binary classiication rule in the simple perceptron with bi-nary/continuous weight prior.