How to Achieve Minimax Expected Kullback-Leibler Distance from an Unknown Finite Distribution

We consider a problem that is related to the "Universal Encoding Problem" from information theory. The basic goal is to find rules that map "partial information" about a distribution X over an m-letter alphabet into a guess X for X such that the Kullback-Leibler divergence between X and X is as small as possible. The cost associated with a rule is the maximal expected Kullback-Leibler divergence between X and X. First, we show that the cost associated with the well-known add-one rule equals ln(1+(m-1)/(n+1)) thereby extending a result of Forster and Warmuth [3,2] to m ? 3. Second, we derive an absolute (as opposed to asymptotic) lower bound on the smallest possible cost. Technically, this is done by determining (almost exactly) the Bayes error of the add-one rule with a uniform prior (where the asymptotics for n ? ? was known before). Third, we hint to tools from approximation theory and support the conjecture that there exists a rule whose cost asymptotically matches the theoretical barrier from the lower bound.