Exp-Concavity of Proper Composite Losses

The goal of online prediction with expert advice is to find a decision strategy which will perform almost as well as the best expert in a given pool of experts, on any sequence of outcomes. This problem has been widely studied and O( √ T ) and O(log T ) regret bounds can be achieved for convex losses (Zinkevich (2003)) and strictly convex losses with bounded first and second derivatives (Hazan et al. (2007)) respectively. In special cases like the Aggregating Algorithm (Vovk (1995)) with mixable losses and the Weighted Average Algorithm (Kivinen and Warmuth (1999)) with exp-concave losses, it is possible to achieve O(1) regret bounds. van Erven (2012) has argued that mixability and exp-concavity are roughly equivalent under certain conditions. Thus by understanding the underlying relationship between these two notions we can gain the best of both algorithms (strong theoretical performance guarantees of the Aggregating Algorithm and the computational efficiency of the Weighted Average Algorithm). In this paper we provide a complete characterization of the exp-concavity of any proper composite loss. Using this characterization and the mixability condition of proper losses (Van Erven et al. (2012)), we show that it is possible to transform (reparameterize) any β-mixable binary proper loss into a β-exp-concave composite loss with the same β. In the multi-class case, we propose an approximation approach for this transformation.

[1]  Elad Hazan,et al.  Logarithmic regret algorithms for online convex optimization , 2006, Machine Learning.

[2]  Tim van Erven,et al.  From Exp-concavity to Mixability , 2013 .

[3]  D. Hand,et al.  Local Versus Global Models for Classification Problems , 2003 .

[4]  V. Vovk Competitive On‐line Statistics , 2001 .

[5]  Mark D. Reid,et al.  Mixability is Bayes Risk Curvature Relative to Log Loss , 2011, COLT.

[6]  Mark D. Reid,et al.  Composite Binary Losses , 2009, J. Mach. Learn. Res..

[7]  Manfred K. Warmuth,et al.  Averaging Expert Predictions , 1999, EuroCOLT.

[8]  David J. Hand,et al.  Deconstructing Statistical Questions , 1994 .

[9]  Vladimir Vovk,et al.  A game of prediction with expert advice , 1995, COLT '95.

[10]  R. Tyrrell Rockafellar,et al.  Convex Analysis , 1970, Princeton Landmarks in Mathematics and Physics.

[11]  Xin Guo,et al.  On the optimality of conditional expectation as a Bregman predictor , 2005, IEEE Trans. Inf. Theory.

[12]  Mark D. Reid,et al.  Information, Divergence and Risk for Binary Experiments , 2009, J. Mach. Learn. Res..

[13]  David Haussler,et al.  Sequential Prediction of Individual Sequences Under General Loss Functions , 1998, IEEE Trans. Inf. Theory.

[14]  Mark D. Reid,et al.  Composite Multiclass Losses , 2011, J. Mach. Learn. Res..

[15]  S. Dragomir Some Gronwall Type Inequalities and Applications , 2003 .

[16]  Vladimir Vovk,et al.  Prediction with expert advice for the Brier game , 2007, ICML '08.

[17]  A. Buja,et al.  Loss Functions for Binary Class Probability Estimation and Classification: Structure and Applications , 2005 .

[18]  Robert C. Williamson,et al.  The Geometry of Losses , 2014, COLT.

[19]  Yuri Kalnishkan,et al.  The weak aggregating algorithm and weak mixability , 2008, J. Comput. Syst. Sci..

[20]  A. Raftery,et al.  Strictly Proper Scoring Rules, Prediction, and Estimation , 2007 .

[21]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.