The value of information in multi-armed bandits with exponentially distributed rewards
暂无分享,去创建一个
[1] Warren B. Powell,et al. Approximate Dynamic Programming - Solving the Curses of Dimensionality , 2007 .
[2] Jürgen Branke,et al. Sequential Sampling to Myopically Maximize the Expected Value of Information , 2010, INFORMS J. Comput..
[3] Warren B. Powell,et al. Paradoxes in Learning and the Marginal Value of Information , 2010, Decis. Anal..
[4] D. Berry,et al. Optimal designs for clinical trials with dichotomous responses. , 1985, Statistics in medicine.
[5] Warren B. Powell,et al. The knowledge gradient algorithm for online subset selection , 2009, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning.
[6] R. Agrawal. Sample mean based index policies by O(log n) regret for the multi-armed bandit problem , 1995, Advances in Applied Probability.
[7] S. Gupta,et al. Bayesian look ahead one-stage sampling allocations for selection of the best population , 1996 .
[8] Yi-Ching Yao. Some results on the Gittins index for a normal reward process , 2007, math/0702831.
[9] Warren B. Powell,et al. A Monte Carlo knowledge gradient method for learning abatement potential of emissions reduction technologies , 2009, Proceedings of the 2009 Winter Simulation Conference (WSC).
[10] Qing Zhao,et al. Distributed Learning in Multi-Armed Bandit With Multiple Players , 2009, IEEE Transactions on Signal Processing.
[11] T. Lai. Adaptive treatment allocation and the multi-armed bandit problem , 1987 .
[12] Peter I. Frazier,et al. The conjunction of the knowledge gradient and the economic approach to simulation selection , 2009, Proceedings of the 2009 Winter Simulation Conference (WSC).
[13] Warren B. Powell,et al. The Knowledge-Gradient Policy for Correlated Normal Beliefs , 2009, INFORMS J. Comput..
[14] Warren B. Powell,et al. The Knowledge Gradient Algorithm for a General Class of Online Learning Problems , 2012, Oper. Res..
[15] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 1985 .
[16] T. Lai,et al. Optimal learning and experimentation in bandit problems , 2000 .
[17] Warren B. Powell,et al. On the robustness of a one-period look-ahead policy in multi-armed bandit problems , 2010, ICCS.
[18] Warren B. Powell,et al. A Knowledge-Gradient Policy for Sequential Information Collection , 2008, SIAM J. Control. Optim..
[19] J. Gittins,et al. The Learning Component of Dynamic Allocation Indices , 1992 .
[20] Richard S. Sutton,et al. Dimensions of Reinforcement Learning , 1998 .
[21] S. Gupta,et al. Bayesian look ahead one stage sampling allocations for selecting the largest normal mean , 1994 .
[22] M. Degroot. Optimal Statistical Decisions , 1970 .
[23] T. Lai,et al. Time series and related topics : in memory of Ching-Zong Wei , 2007, math/0703053.
[24] Warren B. Powell,et al. Information Collection on a Graph , 2011, Oper. Res..
[25] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[26] Evan L. Porteus,et al. Stalking Information: Bayesian Inventory Management with Unobserved Lost Sales , 1999 .
[27] J. Gittins,et al. A dynamic allocation index for the discounted multiarmed bandit problem , 1979 .
[28] Peter Key,et al. On the Bayesian Steady Forecasting Model , 1981 .
[29] J. Bather,et al. Multi‐Armed Bandit Allocation Indices , 1990 .
[30] Andrew G. Barto,et al. Reinforcement learning , 1998 .
[31] Benjamin Van Roy,et al. Dynamic Pricing with a Prior on Market Response , 2010, Oper. Res..
[32] Stephen E. Chick,et al. Economic Analysis of Simulation Selection Problems , 2009, Manag. Sci..