论文信息 - Fast rates with high probability in exp-concave statistical learning

Fast rates with high probability in exp-concave statistical learning

We present an algorithm for the statistical learning setting with a bounded exp-concave loss in $d$ dimensions that obtains excess risk $O(d \log(1/\delta)/n)$ with probability at least $1 - \delta$. The core technique is to boost the confidence of recent in-expectation $O(d/n)$ excess risk bounds for empirical risk minimization (ERM), without sacrificing the rate, by leveraging a Bernstein condition which holds due to exp-concavity. We also show that with probability $1 - \delta$ the standard ERM method obtains excess risk $O(d (\log(n) + \log(1/\delta))/n)$. We further show that a regret bound for any online learner in this setting translates to a high probability excess risk bound for the corresponding online-to-batch conversion of the online learner. Lastly, we present two high probability bounds for the exp-concave model selection aggregation problem that are quantile-adaptive in a certain sense. The first bound is a purely exponential weights type algorithm, obtains a nearly optimal rate, and has no explicit dependence on the Lipschitz continuity of the loss. The second bound requires Lipschitz continuity but obtains the optimal rate.

Nishant Mehta | Nishant Mehta

[1] D. Freedman. On Tail Probabilities for Martingales , 1975 .

[2] B. Carl,et al. Entropy, Compactness and the Approximation of Operators , 1990 .

[3] Vladimir Vovk,et al. Aggregating strategies , 1990, COLT '90.

[4] Robert E. Schapire,et al. The strength of weak learnability , 1990, Mach. Learn..

[5] Umesh V. Vazirani,et al. An Introduction to Computational Learning Theory , 1994 .

[6] A. Barron,et al. Robustly Minimax Codes for Universal Data Compression , 1998 .

[7] Manfred K. Warmuth,et al. Averaging Expert Predictions , 1999, EuroCOLT.

[8] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[9] André Elisseeff,et al. Stability and Generalization , 2002, J. Mach. Learn. Res..

[10] Mathukumalli Vidyasagar,et al. Learning and Generalization: With Applications to Neural Networks , 2002 .

[11] Claudio Gentile,et al. On the generalization ability of on-line learning algorithms , 2001, IEEE Transactions on Information Theory.