ACHIEVABILITY OF ASYMPTOTIC MINIMAX OPTIMALITY IN ONLINE AND BATCH CODING

The normalized maximum likelihood model achieves the minimax coding (log-loss) regret for data of fixed sample size n. However, it is a batch strategy, i.e., it requires that n be known in advance. Furthermore, it is computationally infeasible for most statistical models, and several computationally feasible alternative strategies have been devised. We characterize the achievability of asymptotic minimaxity by batch strategies (i.e., strategies that depend on n) as well as online strategies (i.e., strategies independent of n). On one hand, we conjecture that for a large class of models, no online strategy can be asymptotically minimax. We prove that this holds under a slightly stronger definition of asymptotic minimaxity. We also show that in the multinomial model, a Bayes mixture defined by the conjugate Dirichlet prior with a simple dependency on n achieves asymptotic minimaxity for all sequences, thus providing a simpler asymptotic minimax strategy compared to earlier work by Xie and Barron. The numerical results also demonstrate superior finite-sample behavior by a number of novel batch algorithms.

[1]  Raphail E. Krichevsky,et al.  The performance of universal encoding , 1981, IEEE Trans. Inf. Theory.

[2]  Yoav Freund,et al.  Predicting a binary sequence almost as well as the optimal biased coin , 2003, COLT '96.

[3]  Jorma Rissanen,et al.  Fisher information and stochastic complexity , 1996, IEEE Trans. Inf. Theory.

[4]  Andrew R. Barron,et al.  Asymptotic minimax regret for data compression, gambling, and prediction , 1997, IEEE Trans. Inf. Theory.

[5]  Manfred K. Warmuth,et al.  The Last-Step Minimax Algorithm , 2000, ALT.

[6]  Manfred K. Warmuth,et al.  Relative Loss Bounds for On-Line Density Estimation with the Exponential Family of Distributions , 1999, Machine Learning.

[7]  Nicolò Cesa-Bianchi,et al.  Worst-Case Bounds for the Logarithmic Loss of Predictors , 1999, Machine Learning.

[8]  A. Barron,et al.  Asymptotically minimax regret for exponential families , 2005 .

[9]  Petri Myllymäki,et al.  A linear-time algorithm for computing the multinomial stochastic complexity , 2007, Inf. Process. Lett..

[10]  Tomi Silander,et al.  On Sensitivity of the MAP Bayesian Network Structure to the Equivalent Sample Size Parameter , 2007, UAI.

[11]  Jorma Rissanen,et al.  Minimum Description Length Principle , 2010, Encyclopedia of Machine Learning.

[12]  Tomi Silander,et al.  Learning locally minimax optimal Bayesian networks , 2010, Int. J. Approx. Reason..

[13]  Wojciech Kotlowski,et al.  Maximum Likelihood vs. Sequential Normalized Maximum Likelihood in On-line Density Estimation , 2011, COLT.

[14]  Peter L. Bartlett,et al.  Horizon-Independent Optimal Prediction with Log-Loss in Exponential Families , 2013, COLT.