Counterfactual Risk Minimization: Learning from Logged Bandit Feedback
暂无分享,去创建一个
[1] D. Rubin,et al. The central role of the propensity score in observational studies for causal effects , 1983 .
[2] Vladimir Vapnik,et al. Statistical learning theory , 1998 .
[3] Peter L. Bartlett,et al. Neural Network Learning - Theoretical Foundations , 1999 .
[4] John Langford,et al. Beating the hold-out: bounds for K-fold and progressive cross-validation , 1999, COLT '99.
[5] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.
[6] John Langford,et al. Cost-sensitive learning by cost-proportionate example weighting , 2003, Third IEEE International Conference on Data Mining.
[7] Thomas Hofmann,et al. Support vector machine learning for interdependent and structured output spaces , 2004, ICML.
[8] John Langford,et al. The Epoch-Greedy Algorithm for Multi-armed Bandits with Side Information , 2007, NIPS.
[9] J. Langford,et al. The Epoch-Greedy algorithm for contextual multi-armed bandits , 2007, NIPS 2007.
[10] E. Ionides. Truncated Importance Sampling , 2008 .
[11] John Langford,et al. Exploration scavenging , 2008, ICML '08.
[12] Massimiliano Pontil,et al. Empirical Bernstein Bounds and Sample-Variance Penalization , 2009, COLT.
[13] John Langford,et al. The offset tree for learning with partial labels , 2008, KDD.
[14] Lihong Li,et al. Learning from Logged Implicit Exploration Data , 2010, NIPS.
[15] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[16] S. V. N. Vishwanathan,et al. A Quasi-Newton Approach to Nonsmooth Convex Optimization Problems in Machine Learning , 2008, J. Mach. Learn. Res..
[17] Wei Chu,et al. A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.
[18] Gaël Varoquaux,et al. Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..
[19] Sergio Herrero-Lopez. Multiclass Support Vector Machine , 2011 .
[20] John Langford,et al. Doubly Robust Policy Evaluation and Learning , 2011, ICML.
[21] Wei Chu,et al. Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms , 2010, WSDM '11.
[22] Javier García,et al. Safe Exploration of State and Action Spaces in Reinforcement Learning , 2012, J. Artif. Intell. Res..
[23] Thorsten Joachims,et al. Multi-armed Bandit Problems with History , 2012, AISTATS.
[24] A. Lewis,et al. Nonsmooth optimization via quasi-Newton methods , 2012, Mathematical programming.
[25] Joaquin Quiñonero Candela,et al. Counterfactual reasoning and learning systems: the example of computational advertising , 2013, J. Mach. Learn. Res..
[26] Katja Hofmann,et al. Reusing historical interaction data for faster online learning to rank for IR , 2013, DIR.
[27] Michèle Sebag,et al. Exploration vs Exploitation vs Safety: Risk-Aware Multi-Armed Bandits , 2013, ACML.
[28] Lihong Li,et al. On Minimax Optimal Offline Policy Evaluation , 2014, ArXiv.
[29] Lihong Li,et al. Counterfactual Estimation and Optimization of Click Metrics for Search Engines , 2014, ArXiv.
[30] John Langford,et al. Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits , 2014, ICML.
[31] Olivier Nicol,et al. Improving offline evaluation of contextual bandit algorithms via bootstrapping techniques , 2014, ICML.
[32] Lihong Li,et al. Toward Minimax Off-policy Value Estimation , 2015, AISTATS.
[33] Thorsten Joachims,et al. Counterfactual Risk Minimization , 2015, ICML.
[34] Philip S. Thomas,et al. High-Confidence Off-Policy Evaluation , 2015, AAAI.
[35] Patrick J. F. Groenen,et al. GenSVM: A Generalized Multiclass Support Vector Machine , 2016, J. Mach. Learn. Res..