Generalized Policy Elimination: an efficient algorithm for Nonparametric Contextual Bandits
暂无分享,去创建一个
[1] John Langford,et al. Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits , 2014, ICML.
[2] Aurélien F. Bibaut,et al. Fast rates for empirical risk minimization over c\`adl\`ag functions with bounded sectional variation norm , 2019 .
[3] Michael I. Jordan,et al. Convexity, Classification, and Risk Bounds , 2006 .
[4] Aurélien F. Bibaut,et al. Fast rates for empirical risk minimization with cadlag losses with bounded sectional variation norm , 2019, 1907.09244.
[5] R. Dudley. The Sizes of Compact Subsets of Hilbert Space and Continuity of Gaussian Processes , 1967 .
[6] Stefan Wager,et al. Efficient Policy Learning , 2017, ArXiv.
[7] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..
[8] Akshay Krishnamurthy,et al. Contextual bandits with surrogate losses: Margin bounds and efficient algorithms , 2018, NeurIPS.
[9] Mark J. van der Laan,et al. The Highly Adaptive Lasso Estimator , 2016, 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA).
[10] Peter L. Bartlett,et al. Online learning with kernel losses , 2018, ICML.
[11] Jon A. Wellner,et al. Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .
[12] Vianney Perchet,et al. The multi-armed bandit problem with covariates , 2011, ArXiv.
[13] A. Chambaz,et al. Performance Guarantees for Policy Learning. , 2020, Annales de l'I.H.P. Probabilites et statistiques.
[14] Haipeng Luo,et al. Practical Contextual Bandits with Regression Oracles , 2018, ICML.
[15] John Langford,et al. Efficient Optimal Learning for Contextual Bandits , 2011, UAI.
[16] S. Geer. Applications of empirical process theory , 2000 .
[17] Adityanand Guntuboyina,et al. Multivariate extensions of isotonic regression and total variation denoising via entire monotonicity and Hardy–Krause variation , 2019, 1903.01395.
[18] Manfred K. Warmuth,et al. The Weighted Majority Algorithm , 1994, Inf. Comput..
[19] Philippe Rigollet,et al. Nonparametric Bandits with Covariates , 2010, COLT.
[20] Ramon van Handel. On the minimal penalty for Markov order estimation , 2009, ArXiv.
[21] P. Massart,et al. Concentration inequalities and model selection , 2007 .
[22] Vladimir Vovk,et al. Aggregating strategies , 1990, COLT '90.
[23] John Langford,et al. The Epoch-Greedy Algorithm for Multi-armed Bandits with Side Information , 2007, NIPS.
[24] Haipeng Luo,et al. Oracle-efficient Online Learning and Auction Design , 2020, J. ACM.
[25] Csaba Szepesvári,et al. Multiclass Classification Calibration Functions , 2016, ArXiv.
[26] Soumendu Sundar Mukherjee,et al. Weak convergence and empirical processes , 2019 .
[27] Claudio Gentile,et al. Algorithmic Chaining and the Role of Partial Feedback in Online Nonparametric Learning , 2017, COLT.