论文信息 - Importance Sampling with Unequal Support

Importance Sampling with Unequal Support

Importance sampling is often used in machine learning when training and testing data come from different distributions. In this paper we propose a new variant of importance sampling that can reduce the variance of importance sampling-based estimates by orders of magnitude when the supports of the training and testing distributions differ. After motivating and presenting our new importance sampling estimator, we provide a detailed theoretical analysis that characterizes both its bias and variance relative to the ordinary importance sampling estimator (in various settings, which include cases where ordinary importance sampling is biased, while our new estimator is not, and vice versa). We conclude with an example of how our new importance sampling estimator can be used to improve estimates of how well a new treatment policy for diabetes will work for an individual, using only data from when the individual used a previous treatment policy.

Philip S. Thomas | Emma Brunskill | P. Thomas | E. Brunskill

[1] C. Cobelli,et al. The UVA/PADOVA Type 1 Diabetes Simulator , 2014, Journal of diabetes science and technology.

[2] Philip S. Thomas,et al. Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning , 2016, ICML.

[3] John Langford,et al. Doubly Robust Policy Evaluation and Learning , 2011, ICML.

[4] Reuven Y. Rubinstein,et al. Simulation and the Monte Carlo method , 1981, Wiley series in probability and mathematical statistics.

[5] J. Hammersley,et al. Monte Carlo Methods , 1965 .

[6] Philip S. Thomas,et al. High-Confidence Off-Policy Evaluation , 2015, AAAI.

[7] A. Preliminaries. Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning , 2016 .

[8] W. Hoeffding. Probability Inequalities for sums of Bounded Random Variables , 1963 .

[9] J. Hammersley. MONTE CARLO METHODS FOR SOLVING MULTIVARIABLE PROBLEMS , 1960 .

[10] P. Massart,et al. Concentration inequalities and model selection , 2007 .

[11] Nan Jiang,et al. Doubly Robust Off-policy Value Evaluation for Reinforcement Learning , 2015, ICML.

[12] Philip S. Thomas,et al. Safe Reinforcement Learning , 2015 .

[13] Meysam Bastani,et al. Model-Free Intelligent Diabetes Management Using Machine Learning , 2014 .

[14] Philip S. Thomas,et al. High Confidence Policy Improvement , 2015, ICML.