Bayesian Reinforcement Learning: A Survey
暂无分享,去创建一个
Shie Mannor | Joelle Pineau | Aviv Tamar | Mohammad Ghavamzadeh | Aviv Tamar | Shie Mannor | M. Ghavamzadeh | Joelle Pineau
[1] P. Levy,et al. Calcul des Probabilites , 1926, The Mathematical Gazette.
[2] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[3] Karl Johan Åström,et al. Optimal control of Markov processes with incomplete state information , 1965 .
[4] Edward J. Sondik,et al. The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..
[5] J. Gittins. Bandit processes and dynamic allocation indices , 1979 .
[6] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[7] Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .
[8] Edward J. Wegman,et al. Statistical Signal Processing , 1985 .
[9] Peter W. Glynn,et al. Likelihood ratio gradient estimation for stochastic systems , 1990, CACM.
[10] A. O'Hagan,et al. Bayes–Hermite quadrature , 1991 .
[11] Richard S. Sutton,et al. Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.
[12] O. Zane. Discrete-time Bayesian adaptive control problems with complete observations , 1992, [1992] Proceedings of the 31st IEEE Conference on Decision and Control.
[13] Sean P. Meyn,et al. Bayesian adaptive control of time varying systems , 1992, [1992] Proceedings of the 31st IEEE Conference on Decision and Control.
[14] C. Atkeson,et al. Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.
[15] J. Tsitsiklis. A short proof of the Gittins index theorem , 1993, Proceedings of 32nd IEEE Conference on Decision and Control.
[16] Gerald Tesauro,et al. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.
[17] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[18] Peter Norvig,et al. Artificial Intelligence: A Modern Approach , 1995 .
[19] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[20] Björn Wittenmark,et al. Adaptive Dual Control Methods: An Overview , 1995 .
[21] A. Guez,et al. Optimal adaptive control of uncertain stochastic linear systems , 1995, 1995 IEEE International Conference on Systems, Man and Cybernetics. Intelligent Systems for the 21st Century.
[22] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[23] Christopher G. Atkeson,et al. A comparison of direct and model-based reinforcement learning , 1997, Proceedings of International Conference on Robotics and Automation.
[24] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..
[25] Rich Caruana,et al. Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.
[26] Stuart J. Russell. Learning agents for uncertain environments (extended abstract) , 1998, COLT' 98.
[27] Yoram Singer,et al. Efficient Bayesian Parameter Estimation in Large Discrete Domains , 1998, NIPS.
[28] Jonathan Baxter. KnightCap : A chess program that learns by combining TD ( ) with game-tree search , 1998 .
[29] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[30] Stuart J. Russell,et al. Bayesian Q-Learning , 1998, AAAI/IAAI.
[31] Alexander J. Smola,et al. Learning with kernels , 1998 .
[32] David Haussler,et al. Exploiting Generative Models in Discriminative Classifiers , 1998, NIPS.
[33] Andrew Tridgell,et al. KnightCap: A Chess Programm That Learns by Combining TD(lambda) with Game-Tree Search , 1998, ICML.
[34] David Andre,et al. Model based Bayesian Exploration , 1999, UAI.
[35] Ilan Rusnak Rafael. Optimal Adaptive Control of Uncertain Stochastic Discrete Linear Systems , 1999 .
[36] Justin A. Boyan,et al. Least-Squares Temporal Difference Learning , 1999, ICML.
[37] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[38] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[39] N. Filatov,et al. Survey of adaptive dual control methods , 2000 .
[40] Malcolm J. A. Strens,et al. A Bayesian Framework for Reinforcement Learning , 2000, ICML.
[41] Jonathan Baxter,et al. A Model of Inductive Bias Learning , 2000, J. Artif. Intell. Res..
[42] Andrew Y. Ng,et al. Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.
[43] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[44] Peter L. Bartlett,et al. Experiments with Infinite-Horizon, Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[45] Michael O. Duff,et al. Monte-Carlo Algorithms for the Improvement of Finite-State Stochastic Controllers: Application to Bayes-Adaptive Markov Decision Processes , 2001, AISTATS.
[46] Andrew G. Barto,et al. Optimal learning: computational procedures for bayes-adaptive markov decision processes , 2002 .
[47] Carl E. Rasmussen,et al. Bayesian Monte Carlo , 2002, NIPS.
[48] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[49] Shie Mannor,et al. Sparse Online Greedy Support Vector Regression , 2002, ECML.
[50] A. Greenfield,et al. Adaptive Control of Nonlinear Stochastic Systems by Particle Filtering , 2003, 2003 4th International Conference on Control and Automation Proceedings.
[51] S. Shankar Sastry,et al. Autonomous Helicopter Flight via Reinforcement Learning , 2003, NIPS.
[52] Carl E. Rasmussen,et al. Gaussian Processes in Reinforcement Learning , 2003, NIPS.
[53] Joelle Pineau,et al. Point-based value iteration: An anytime algorithm for POMDPs , 2003, IJCAI.
[54] Shie Mannor,et al. Bayes Meets Bellman: The Gaussian Process Approach to Temporal Difference Learning , 2003, ICML.
[55] John N. Tsitsiklis,et al. The Sample Complexity of Exploration in the Multi-Armed Bandit Problem , 2004, J. Mach. Learn. Res..
[56] Shie Mannor,et al. The Cross Entropy Method for Fast Policy Search , 2003, ICML.
[57] Nello Cristianini,et al. Kernel Methods for Pattern Analysis , 2003, ICTAI.
[58] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[59] Peter Norvig,et al. Artificial intelligence - a modern approach, 2nd Edition , 2003, Prentice Hall series in artificial intelligence.
[60] Craig Boutilier,et al. Coordination in multiagent reinforcement learning: a Bayesian approach , 2003, AAMAS '03.
[61] David A. McAllester. Some PAC-Bayesian Theorems , 1998, COLT' 98.
[62] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[63] Peter Stone,et al. Policy gradient reinforcement learning for fast quadrupedal locomotion , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.
[64] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.
[65] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[66] Yishay Mansour,et al. A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.
[67] Andrew G. Barto,et al. Elevator Group Control Using Multiple Reinforcement Learning Agents , 1998, Machine Learning.
[68] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 1998, Machine Learning.
[69] Andrew W. Moore,et al. Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.
[70] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.
[71] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.
[72] Nikos A. Vlassis,et al. Perseus: Randomized Point-based Value Iteration for POMDPs , 2005, J. Artif. Intell. Res..
[73] Joelle Pineau,et al. Active Learning in Partially Observable Markov Decision Processes , 2005, ECML.
[74] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..
[75] Michael L. Littman,et al. A theoretical analysis of Model-Based Interval Estimation , 2005, ICML.
[76] Laurent El Ghaoui,et al. Robust Control of Markov Decision Processes with Uncertain Transition Matrices , 2005, Oper. Res..
[77] Tao Wang,et al. Bayesian sparse sampling for on-line reward optimization , 2005, ICML.
[78] Peter Szabó,et al. Learning to Control an Octopus Arm with Gaussian Process Temporal Difference Methods , 2005, NIPS.
[79] Yaakov Engel,et al. Algorithms and representations for reinforcement learning (עם תקציר בעברית, תכן ושער נוסף: אלגוריתמים וייצוגים ללמידה מחיזוקים.; אלגוריתמים וייצוגים ללמידה מחיזוקים.) , 2005 .
[80] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[81] Shie Mannor,et al. Reinforcement learning with Gaussian processes , 2005, ICML.
[82] Brahim Chaib-draa,et al. An online POMDP algorithm for complex multiagent environments , 2005, AAMAS '05.
[83] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.
[84] J. Andrew Bagnell,et al. Maximum margin planning , 2006, ICML.
[85] Pascal Poupart,et al. Point-Based Value Iteration for Continuous POMDPs , 2006, J. Mach. Learn. Res..
[86] Jesse Hoey,et al. An analytic solution to discrete Bayesian reinforcement learning , 2006, ICML.
[87] Mohammad Ghavamzadeh,et al. Bayesian Policy Gradient Algorithms , 2006, NIPS.
[88] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.
[89] Doina Precup,et al. Using Linear Programming for Bayesian Exploration in Markov Decision Processes , 2007, IJCAI.
[90] Shalabh Bhatnagar,et al. Incremental Natural Actor-Critic Algorithms , 2007, NIPS.
[91] John N. Tsitsiklis,et al. Bias and Variance Approximation in Value Function Estimates , 2007, Manag. Sci..
[92] Mohammad Ghavamzadeh,et al. Bayesian actor-critic algorithms , 2007, ICML '07.
[93] Robert E. Schapire,et al. A Game-Theoretic Approach to Apprenticeship Learning , 2007, NIPS.
[94] Joelle Pineau,et al. Bayes-Adaptive POMDPs , 2007, NIPS.
[95] Alan Fern,et al. Multi-task reinforcement learning: a hierarchical Bayesian approach , 2007, ICML '07.
[96] Csaba Szepesvári,et al. Apprenticeship Learning using Inverse Reinforcement Learning and Gradient Methods , 2007, UAI.
[97] Eyal Amir,et al. Bayesian Inverse Reinforcement Learning , 2007, IJCAI.
[98] Joelle Pineau,et al. Online Planning Algorithms for POMDPs , 2008, J. Artif. Intell. Res..
[99] Anind K. Dey,et al. Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.
[100] Michael L. Littman,et al. An analysis of model-based Interval Estimation for Markov Decision Processes , 2008, J. Comput. Syst. Sci..
[101] Sriraam Natarajan,et al. Transfer in variable-reward hierarchical reinforcement learning , 2008, Machine Learning.
[102] Joelle Pineau,et al. Model-Based Bayesian Reinforcement Learning in Large Structured Domains , 2008, UAI.
[103] Risto Miikkulainen,et al. Online kernel selection for Bayesian reinforcement learning , 2008, ICML '08.
[104] Joelle Pineau,et al. Bayesian reinforcement learning in continuous POMDPs with application to robot navigation , 2008, 2008 IEEE International Conference on Robotics and Automation.
[105] Joelle Pineau,et al. Reinforcement learning with limited reinforcement: using Bayes risk for active learning in POMDPs , 2008, ICML '08.
[106] Shie Mannor,et al. Regularized Policy Iteration , 2008, NIPS.
[107] Don H. Johnson,et al. Statistical Signal Processing , 2009, Encyclopedia of Biometrics.
[108] Shalabh Bhatnagar,et al. Natural actor-critic algorithms , 2009, Autom..
[109] Andrew Y. Ng,et al. Near-Bayesian exploration in polynomial time , 2009, ICML '09.
[110] Finale Doshi-Velez,et al. The Infinite Partially Observable Markov Decision Process , 2009, NIPS.
[111] Shie Mannor,et al. Regularized Fitted Q-iteration: Application to Planning , 2008, EWRL.
[112] Lihong Li,et al. A Bayesian Sampling Approach to Exploration in Reinforcement Learning , 2009, UAI.
[113] Joelle Pineau,et al. A bayesian reinforcement learning approach for customizing human-robot interfaces , 2009, IUI.
[114] Michael L. Littman,et al. A unifying framework for computational reinforcement learning theory , 2009 .
[115] Carl E. Rasmussen,et al. Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.
[116] Brahim Chaib-draa,et al. Bayesian reinforcement learning in continuous POMDPs with gaussian processes , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[117] Richard L. Lewis,et al. Variance-Based Rewards for Approximate Bayesian Reinforcement Learning , 2010, UAI.
[118] Joaquin Quiñonero Candela,et al. Web-Scale Bayesian Click-Through rate Prediction for Sponsored Search Advertising in Microsoft's Bing Search Engine , 2010, ICML.
[119] Alessandro Lazaric,et al. Bayesian Multi-Task Reinforcement Learning , 2010, ICML.
[120] Nicholas R. Jennings,et al. Cooperative Games with Overlapping Coalitions , 2010, J. Artif. Intell. Res..
[121] John N. Tsitsiklis,et al. Linearly Parameterized Bandits , 2008, Math. Oper. Res..
[122] Joshua B. Tenenbaum,et al. Nonparametric Bayesian Policy Priors for Reinforcement Learning , 2010, NIPS.
[123] Doina Precup,et al. Smarter Sampling in Model-Based Bayesian Reinforcement Learning , 2010, ECML/PKDD.
[124] U. Rieder,et al. Markov Decision Processes , 2010 .
[125] Thomas J. Walsh,et al. Integrating Sample-Based Planning and Model-Based Reinforcement Learning , 2010, AAAI.
[126] Shie Mannor,et al. Percentile Optimization for Markov Decision Processes with Parameter Uncertainty , 2010, Oper. Res..
[127] Joelle Pineau,et al. PAC-Bayesian Model Selection for Reinforcement Learning , 2010, NIPS.
[128] Steven L. Scott,et al. A modern Bayesian look at the multi-armed bandit , 2010 .
[129] John Shawe-Taylor,et al. PAC-Bayesian Analysis of the Exploration-Exploitation Trade-off , 2011, ICML 2011.
[130] John Shawe-Taylor,et al. PAC-Bayesian Analysis of Contextual Bandits , 2011, NIPS.
[131] Christos Dimitrakakis,et al. Bayesian Multitask Inverse Reinforcement Learning , 2011, EWRL.
[132] TaeChoong Chung,et al. Hessian matrix distribution for Bayesian policy gradient reinforcement learning , 2011, Inf. Sci..
[133] Joelle Pineau,et al. Bayesian reinforcement learning for POMDP-based dialogue systems , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[134] José Niòo-Mora. Computing a Classic Index for Finite-Horizon Bandits , 2011 .
[135] Warren B. Powell,et al. “Approximate dynamic programming: Solving the curses of dimensionality” by Warren B. Powell , 2007, Wiley Series in Probability and Statistics.
[136] Kee-Eung Kim,et al. MAP Inference for Bayesian Inverse Reinforcement Learning , 2011, NIPS.
[137] Eduardo F. Morales,et al. An Introduction to Reinforcement Learning , 2011 .
[138] Michael L. Littman,et al. Apprenticeship Learning About Multiple Intentions , 2011, ICML.
[139] Lihong Li,et al. An Empirical Evaluation of Thompson Sampling , 2011, NIPS.
[140] Joelle Pineau,et al. PAC-Bayesian Policy Evaluation for Reinforcement Learning , 2011, UAI.
[141] José Niño-Mora,et al. Computing a Classic Index for Finite-Horizon Bandits , 2011, INFORMS J. Comput..
[142] Joelle Pineau,et al. A Bayesian Approach for Learning and Planning in Partially Observable Markov Decision Processes , 2011, J. Mach. Learn. Res..
[143] Olivier Buffet,et al. Near-Optimal BRL using Optimistic Local Transitions , 2012, ICML.
[144] Rémi Munos,et al. Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis , 2012, ALT.
[145] L. F. Bertuccelli,et al. Robust Adaptive Markov Decision Processes: Planning with Model Uncertainty , 2012, IEEE Control Systems.
[146] Jonathan P. How,et al. Improving the efficiency of Bayesian inverse reinforcement learning , 2012, 2012 IEEE International Conference on Robotics and Automation.
[147] Aurélien Garivier,et al. On Bayesian Upper Confidence Bounds for Bandit Problems , 2012, AISTATS.
[148] Peter Dayan,et al. Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search , 2012, NIPS.
[149] Kee-Eung Kim,et al. Nonparametric Bayesian Inverse Reinforcement Learning for Multiple Reward Functions , 2012, NIPS.
[150] Sébastien Bubeck,et al. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..
[151] Michèle Sebag,et al. The grand challenge of computer Go , 2012, Commun. ACM.
[152] Jonathan P. How,et al. Bayesian Nonparametric Inverse Reinforcement Learning , 2012, ECML/PKDD.
[153] Shipra Agrawal,et al. Analysis of Thompson Sampling for the Multi-armed Bandit Problem , 2011, COLT.
[154] Michael H. Bowling,et al. Tractable Objectives for Robust Policy Optimization , 2012, NIPS.
[155] Lucian Busoniu,et al. Optimistic planning for belief-augmented Markov Decision Processes , 2013, 2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).
[156] Jan Peters,et al. Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..
[157] Benjamin Van Roy,et al. (More) Efficient Reinforcement Learning via Posterior Sampling , 2013, NIPS.
[158] J. Asmuth. Model-based Bayesian Reinforcement Learning with Generalized Priors , 2013 .
[159] Shipra Agrawal,et al. Further Optimal Regret Bounds for Thompson Sampling , 2012, AISTATS.
[160] Kenji Kawaguchi,et al. A Greedy Approximation of Bayesian Reinforcement Learning with Probably Optimistic Transition Model , 2013, ArXiv.
[161] Liang Tang,et al. Automatic ad format selection via contextual bandits , 2013, CIKM.
[162] Shipra Agrawal,et al. Thompson Sampling for Contextual Bandits with Linear Payoffs , 2012, ICML.
[163] Sudipto Guha,et al. Stochastic Regret Minimization via Thompson Sampling , 2014, COLT.
[164] Benjamin Van Roy,et al. Learning to Optimize via Posterior Sampling , 2013, Math. Oper. Res..
[165] Shie Mannor,et al. Thompson Sampling for Complex Online Problems , 2013, ICML.
[166] Shie Mannor,et al. Thompson Sampling for Learning Parameterized Markov Decision Processes , 2014, COLT.
[167] Csaba Szepesvári,et al. Bayesian Optimal Control of Smoothly Parameterized Systems , 2015, UAI.
[168] Michal Valko,et al. Bayesian Policy Gradient and Actor-Critic Algorithms , 2016, J. Mach. Learn. Res..
[169] Lihong Li,et al. On the Prior Sensitivity of Thompson Sampling , 2015, ALT.
[170] Damien Ernst,et al. Benchmarking for Bayesian Reinforcement Learning , 2016, PloS one.
[171] Benjamin Van Roy,et al. An Information-Theoretic Analysis of Thompson Sampling , 2014, J. Mach. Learn. Res..
[172] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .