暂无分享,去创建一个
[1] J. Langford,et al. The Epoch-Greedy algorithm for contextual multi-armed bandits , 2007, NIPS 2007.
[2] Xiaojin Zhu,et al. Introduction to Semi-Supervised Learning , 2009, Synthesis Lectures on Artificial Intelligence and Machine Learning.
[3] Wei Chu,et al. Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms , 2010, WSDM '11.
[4] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..
[5] Yaoliang Yu,et al. Analysis of Kernel Mean Matching under Covariate Shift , 2012, ICML.
[6] D. Rubin,et al. Reducing Bias in Observational Studies Using Subclassification on the Propensity Score , 1984 .
[7] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[8] Maxim Raginsky,et al. Information-Based Complexity, Feedback and Dynamics in Convex Programming , 2010, IEEE Transactions on Information Theory.
[9] Michael I. Jordan,et al. PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.
[10] John Langford,et al. Exploration scavenging , 2008, ICML '08.
[11] Joaquin Quiñonero Candela,et al. Counterfactual reasoning and learning systems: the example of computational advertising , 2013, J. Mach. Learn. Res..
[12] Sanjoy Dasgupta,et al. Two faces of active learning , 2009, Theor. Comput. Sci..
[13] Lihong Li,et al. Learning from Logged Implicit Exploration Data , 2010, NIPS.
[14] D. Rubin,et al. The central role of the propensity score in observational studies for causal effects , 1983 .
[15] Masashi Sugiyama,et al. Input-dependent estimation of generalization error under covariate shift , 2005 .
[16] Christian Igel,et al. Hoeffding and Bernstein races for selecting policies in evolutionary direct policy search , 2009, ICML '09.
[17] Eli Upfal,et al. Probability and Computing: Randomized Algorithms and Probabilistic Analysis , 2005 .
[18] G. Imbens,et al. Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score , 2000 .
[19] Richard S. Sutton,et al. A Convergent O(n) Temporal-difference Algorithm for Off-policy Learning with Linear Function Approximation , 2008, NIPS.