暂无分享,去创建一个
[1] J. Robins,et al. Semiparametric Efficiency in Multivariate Regression Models with Missing Data , 1995 .
[2] John Langford,et al. Doubly Robust Policy Evaluation and Learning , 2011, ICML.
[3] Thorsten Joachims,et al. Counterfactual Risk Minimization: Learning from Logged Bandit Feedback , 2015, ICML.
[4] Nathan Kallus,et al. Policy Evaluation and Optimization with Continuous Treatments , 2018, AISTATS.
[5] M. Fukushima,et al. A generalized proximal point algorithm for certain non-convex minimization problems , 1981 .
[6] R. Rockafellar. Monotone Operators and the Proximal Point Algorithm , 1976 .
[7] Nan Jiang,et al. Doubly Robust Off-policy Value Evaluation for Reinforcement Learning , 2015, ICML.
[8] Gaël Varoquaux,et al. Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..
[9] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[10] Jorge Nocedal,et al. On the limited memory BFGS method for large scale optimization , 1989, Math. Program..
[11] Thorsten Joachims,et al. The Self-Normalized Estimator for Counterfactual Learning , 2015, NIPS.
[12] John Langford,et al. The Epoch-Greedy Algorithm for Multi-armed Bandits with Side Information , 2007, NIPS.
[13] Zaïd Harchaoui,et al. Catalyst for Gradient-based Nonconvex Optimization , 2018, AISTATS.
[14] Keying Ye,et al. Applied Bayesian Modeling and Causal Inference From Incomplete-Data Perspectives , 2005, Technometrics.
[15] G. Imbens,et al. The Propensity Score with Continuous Treatments , 2005 .
[16] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[17] Miroslav Dudík,et al. Optimal and Adaptive Off-policy Evaluation in Contextual Bandits , 2016, ICML.
[18] Dimitris Bertsimas,et al. Optimization over Continuous and Multi-dimensional Decisions with Observational Data , 2018, NeurIPS.
[19] Massimiliano Pontil,et al. Empirical Bernstein Bounds and Sample-Variance Penalization , 2009, COLT.
[20] Vasilis Syrgkanis,et al. Semi-Parametric Efficient Policy Learning with Continuous Actions , 2019, NeurIPS.
[21] M. de Rijke,et al. Large-scale Validation of Counterfactual Learning Methods: A Test-Bed , 2016, ArXiv.
[22] M. de Rijke,et al. Deep Learning with Logged Bandit Feedback , 2018, ICLR.
[23] John Langford,et al. Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits , 2014, ICML.
[24] Vasilis Syrgkanis,et al. Orthogonal Statistical Learning , 2019, The Annals of Statistics.
[25] R. Altman,et al. Estimation of the warfarin dose with clinical and pharmacogenetic data. , 2009, The New England journal of medicine.
[26] D. Horvitz,et al. A Generalization of Sampling Without Replacement from a Finite Universe , 1952 .
[27] C. Barnes,et al. Drug dosage in laboratory animals : a handbook , 1964 .
[28] Joaquin Quiñonero Candela,et al. Counterfactual reasoning and learning systems: the example of computational advertising , 2013, J. Mach. Learn. Res..
[29] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[30] Julien Mairal,et al. Cyanure: An Open-Source Toolbox for Empirical Risk Minimization for Python, C++, and soon more , 2019, ArXiv.
[31] Nicolas Le Roux,et al. Understanding the impact of entropy on policy optimization , 2018, ICML.
[32] Wei Chu,et al. An unbiased offline evaluation of contextual bandit algorithms with generalized linear models , 2011 .
[33] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.