Efficient Bayesian Nonparametric Methods for Model-Free Reinforcement Learning in Centralized and Decentralized Sequential Environments
暂无分享,去创建一个
[1] Anne Condon,et al. On the Undecidability of Probabilistic Planning and Infinite-Horizon Partially Observable Markov Decision Problems , 1999, AAAI/IAAI.
[2] Edward J. Sondik,et al. The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..
[3] Christian P. Robert,et al. Monte Carlo Statistical Methods , 2005, Springer Texts in Statistics.
[4] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[5] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[6] Eric Moulines,et al. On‐line expectation–maximization algorithm for latent data models , 2007, ArXiv.
[7] Nicholas R. J. Lawrance,et al. Gaussian processes for informative exploration in reinforcement learning , 2013, 2013 IEEE International Conference on Robotics and Automation.
[8] Peter Dayan,et al. Scalable and Efficient Bayes-Adaptive Reinforcement Learning Based on Monte-Carlo Tree Search , 2013, J. Artif. Intell. Res..
[9] Kee-Eung Kim,et al. Solving POMDPs by Searching the Space of Finite Policies , 1999, UAI.
[10] Yoram Singer,et al. The Hierarchical Hidden Markov Model: Analysis and Applications , 1998, Machine Learning.
[11] Matthijs T. J. Spaan,et al. Multi-robot planning under uncertainty with communication: a case study , 2010 .
[12] Edward J. Sondik,et al. The Optimal Control of Partially Observable Markov Processes over the Infinite Horizon: Discounted Costs , 1978, Oper. Res..
[13] Hui Li,et al. Multi-task Reinforcement Learning in Partially Observable Stochastic Environments , 2009, J. Mach. Learn. Res..
[14] W. Haddad,et al. Nonlinear Dynamical Systems and Control: A Lyapunov-Based Approach , 2008 .
[15] Shlomo Zilberstein,et al. Increasing scalability in algorithms for centralized and decentralized partially observable markov decision processes: efficient decision-making and coordination in uncertain environments , 2010 .
[16] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.
[17] Marc Peter Deisenroth,et al. Efficient reinforcement learning using Gaussian processes , 2010 .
[18] Carlos Guestrin,et al. Multiagent Planning with Factored MDPs , 2001, NIPS.
[19] Eric Wiewiora,et al. Learning predictive representations from a history , 2005, ICML.
[20] Qiang Yang,et al. A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.
[21] Joelle Pineau,et al. Anytime Point-Based Approximations for Large POMDPs , 2006, J. Artif. Intell. Res..
[22] Steve J. Young,et al. USING POMDPS FOR DIALOG MANAGEMENT , 2006, 2006 IEEE Spoken Language Technology Workshop.
[23] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[24] Lawrence Carin,et al. Learning to Explore and Exploit in POMDPs , 2009, NIPS.
[25] Gavin Taylor,et al. Kernelized value function approximation for reinforcement learning , 2009, ICML '09.
[26] L. Carin,et al. Transfer Learning for Reinforcement Learning with Dependent Dirichlet Process and Gaussian Process , 2012 .
[27] Chong Wang,et al. Variational inference in nonconjugate models , 2012, J. Mach. Learn. Res..
[28] Leslie Pack Kaelbling,et al. Bayesian Policy Search with Policy Priors , 2011, IJCAI.
[29] U. Rieder,et al. Markov Decision Processes , 2010 .
[30] Lancelot F. James,et al. Gibbs Sampling Methods for Stick-Breaking Priors , 2001 .
[31] Brahim Chaib-draa,et al. Predictive representations for policy gradient in POMDPs , 2009, ICML '09.
[32] Jonathan P. How,et al. Dynamic Clustering via Asymptotics of the Dependent Dirichlet Process Mixture , 2013, NIPS.
[33] P. Olver. Nonlinear Systems , 2013 .
[34] T. Ferguson. A Bayesian Analysis of Some Nonparametric Problems , 1973 .
[35] Kee-Eung Kim,et al. Learning Finite-State Controllers for Partially Observable Environments , 1999, UAI.
[36] Chong Wang,et al. Online Variational Inference for the Hierarchical Dirichlet Process , 2011, AISTATS.
[37] Douglas Aberdeen,et al. Scalable Internal-State Policy-Gradient Methods for POMDPs , 2002, ICML.
[38] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[39] Andrew W. Moore,et al. Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.
[40] Marc Toussaint,et al. Model-free reinforcement learning as mixture learning , 2009, ICML '09.
[41] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .
[42] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..
[43] Neil Immerman,et al. The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.
[44] Reid G. Simmons,et al. Heuristic Search Value Iteration for POMDPs , 2004, UAI.
[45] Nikos A. Vlassis,et al. The Cross-Entropy Method for Policy Search in Decentralized POMDPs , 2008, Informatica.
[46] Joelle Pineau,et al. Towards robotic assistants in nursing homes: Challenges and results , 2003, Robotics Auton. Syst..
[47] Feng Wu,et al. Monte-Carlo Expectation Maximization for Decentralized POMDPs , 2013, IJCAI.
[48] Carl E. Rasmussen,et al. Sparse Spectrum Gaussian Process Regression , 2010, J. Mach. Learn. Res..
[49] Ronald E. Parr,et al. Hierarchical control and learning for markov decision processes , 1998 .
[50] Chong Wang,et al. Stochastic variational inference , 2012, J. Mach. Learn. Res..
[51] Peter Szabó,et al. Learning to Control an Octopus Arm with Gaussian Process Temporal Difference Methods , 2005, NIPS.
[52] Peter Stone,et al. Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..
[53] D. Blackwell. Discounted Dynamic Programming , 1965 .
[54] D. J. White,et al. A Survey of Applications of Markov Decision Processes , 1993 .
[55] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[56] Girish Chowdhary,et al. Off-policy reinforcement learning with Gaussian processes , 2014, IEEE/CAA Journal of Automatica Sinica.
[57] Marc Toussaint,et al. Hierarchical POMDP Controller Optimization by Likelihood Maximization , 2008, UAI.
[58] Andrew Y. Ng,et al. Regularization and feature selection in least-squares temporal difference learning , 2009, ICML '09.
[59] Theodore J. Perkins,et al. Reinforcement learning for POMDPs based on action values and stochastic optimization , 2002, AAAI/IAAI.
[60] Pascal Poupart,et al. Model-based Bayesian Reinforcement Learning in Partially Observable Domains , 2008, ISAIM.
[61] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[62] Carl E. Rasmussen,et al. Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.
[63] Joshua B. Tenenbaum,et al. Nonparametric Bayesian Policy Priors for Reinforcement Learning , 2010, NIPS.
[64] Shalabh Bhatnagar,et al. Toward Off-Policy Learning Control with Function Approximation , 2010, ICML.
[65] Jesse Hoey,et al. An analytic solution to discrete Bayesian reinforcement learning , 2006, ICML.
[66] Guy Shani,et al. Model-Based Online Learning of POMDPs , 2005, ECML.
[67] Shie Mannor,et al. Reinforcement learning with Gaussian processes , 2005, ICML.
[68] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[69] Shlomo Zilberstein,et al. Improved Memory-Bounded Dynamic Programming for Decentralized POMDPs , 2007, UAI.
[70] Joelle Pineau,et al. Point-based value iteration: An anytime algorithm for POMDPs , 2003, IJCAI.
[71] Sebastian Thrun,et al. Efficient Exploration In Reinforcement Learning , 1992 .
[72] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[73] Ke Jiang,et al. Small-Variance Asymptotics for Hidden Markov Models , 2013, NIPS.
[74] Nikos A. Vlassis,et al. Perseus: Randomized Point-based Value Iteration for POMDPs , 2005, J. Artif. Intell. Res..
[75] Richard S. Sutton,et al. Predictive Representations of State , 2001, NIPS.
[76] Craig Boutilier,et al. Bounded Finite State Controllers , 2003, NIPS.
[77] Bikramjit Banerjee,et al. Sample Bounded Distributed Reinforcement Learning for Decentralized POMDPs , 2012, AAAI.
[78] Matthew J. Johnson,et al. Bayesian nonparametric hidden semi-Markov models , 2012, J. Mach. Learn. Res..
[79] Carl E. Rasmussen,et al. Gaussian Processes in Reinforcement Learning , 2003, NIPS.
[80] Bo Liu,et al. Regularized Off-Policy TD-Learning , 2012, NIPS.
[81] Shlomo Zilberstein,et al. Planetary Rover Control as a Markov Decision Process , 2002 .
[82] Sebastiaan A. Terwijn,et al. On the Learnability of Hidden Markov Models , 2002, ICGI.
[83] Lihong Li,et al. A Bayesian Sampling Approach to Exploration in Reinforcement Learning , 2009, UAI.
[84] Shlomo Zilberstein,et al. Policy Iteration for Decentralized Control of Markov Decision Processes , 2009, J. Artif. Intell. Res..
[85] Dan Lizotte,et al. Convergent Fitted Value Iteration with Linear Function Approximation , 2011, NIPS.
[86] Jason Pazis,et al. PAC Optimal Exploration in Continuous Space Markov Decision Processes , 2013, AAAI.
[87] Lehel Csató,et al. Sparse On-Line Gaussian Processes , 2002, Neural Computation.
[88] Michael L. Littman,et al. A theoretical analysis of Model-Based Interval Estimation , 2005, ICML.
[89] Shlomo Zilberstein,et al. Dynamic Programming for Partially Observable Stochastic Games , 2004, AAAI.
[90] Michael I. Jordan,et al. Hierarchical Dirichlet Processes , 2006 .
[91] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..
[92] Hui Li,et al. Region-based value iteration for partially observable Markov decision processes , 2006, ICML.
[93] Matthew J. Beal. Variational algorithms for approximate Bayesian inference , 2003 .
[94] Padhraic Smyth,et al. Learning concept graphs from text with stick-breaking priors , 2010, NIPS.
[95] Bart De Schutter,et al. Reinforcement Learning and Dynamic Programming Using Function Approximators , 2010 .
[96] Michael L. Littman,et al. A unifying framework for computational reinforcement learning theory , 2009 .
[97] Jaakko Peltonen,et al. Periodic Finite State Controllers for Efficient POMDP and DEC-POMDP Planning , 2011, NIPS.
[98] Lydia E. Kavraki,et al. Automated model approximation for robotic navigation with POMDPs , 2013, 2013 IEEE International Conference on Robotics and Automation.
[99] G. Roberts,et al. Retrospective Markov chain Monte Carlo methods for Dirichlet process hierarchical models , 2007, 0710.4228.
[100] Shlomo Zilberstein,et al. Anytime Planning for Decentralized POMDPs using Expectation Maximization , 2010, UAI.
[101] Victor R. Lesser,et al. Coordinated Multi-Agent Reinforcement Learning in Networked Distributed POMDPs , 2011, AAAI.
[102] Nasser M. Nasrabadi,et al. Pattern Recognition and Machine Learning , 2006, Technometrics.
[103] John N. Tsitsiklis,et al. The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..
[104] Siu-Yeung Cho,et al. A Modified Memory-Based Reinforcement Learning Method for Solving POMDP Problems , 2011, Neural Processing Letters.
[105] Shie Mannor,et al. The kernel recursive least-squares algorithm , 2004, IEEE Transactions on Signal Processing.
[106] É. Moulines,et al. Convergence of a stochastic approximation version of the EM algorithm , 1999 .
[107] Yee Whye Teh,et al. Infinite Hierarchical Hidden Markov Models , 2009, AISTATS.
[108] Byron Boots,et al. Spectral Approaches to Learning Predictive Representations , 2011 .
[109] Nicholas R. Jennings,et al. Decentralized Bayesian reinforcement learning for online agent collaboration , 2012, AAMAS.
[110] Jonathan P. How,et al. Decentralized control of partially observable Markov decision processes , 2015, 52nd IEEE Conference on Decision and Control.
[111] Frans A. Oliehoek,et al. Decentralized POMDPs , 2012, Reinforcement Learning.
[112] Joel Veness,et al. Monte-Carlo Planning in Large POMDPs , 2010, NIPS.
[113] Michael I. Jordan,et al. Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models , 2012, NIPS.
[114] Michael I. Jordan,et al. MAD-Bayes: MAP-based Asymptotic Derivations from Bayes , 2012, ICML.
[115] Sebastian Thrun,et al. Is Learning The n-th Thing Any Easier Than Learning The First? , 1995, NIPS.
[116] Marco Wiering,et al. Utile distinction hidden Markov models , 2004, ICML.
[117] Pierre Priouret,et al. Adaptive Algorithms and Stochastic Approximations , 1990, Applications of Mathematics.
[118] Stuart J. Russell,et al. Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.
[119] Lawrence R. Rabiner,et al. A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.
[120] Hui Li,et al. Point-Based Policy Iteration , 2007, AAAI.
[121] John Langford,et al. Exploration in Metric State Spaces , 2003, ICML.
[122] Jonathan P. How,et al. Planning for decentralized control of multiple robots under uncertainty , 2014, 2015 IEEE International Conference on Robotics and Automation (ICRA).
[123] Andrew McCallum,et al. Reinforcement learning with selective perception and hidden state , 1996 .
[124] Charles L. Isbell,et al. Point Based Value Iteration with Optimal Belief Compression for Dec-POMDPs , 2013, NIPS.
[125] Bikramjit Banerjee,et al. Pruning for Monte Carlo Distributed Reinforcement Learning in Decentralized POMDPs , 2013, AAAI.
[126] Michael I. Jordan,et al. Revisiting k-means: New Algorithms via Bayesian Nonparametrics , 2011, ICML.
[127] Lawrence Carin,et al. Online Expectation Maximization for Reinforcement Learning in POMDPs , 2013, IJCAI.
[128] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.
[129] Thomas J. Walsh,et al. Exploring compact reinforcement-learning representations with linear regression , 2009, UAI.
[130] Frans A. Oliehoek,et al. Value-Based Planning for Teams of Agents in Stochastic Partially Observable Environments , 2010 .
[131] Lawrence Carin,et al. Hidden Markov Models With Stick-Breaking Priors , 2009, IEEE Transactions on Signal Processing.
[132] Tzu-Tsung Wong,et al. Generalized Dirichlet distribution in Bayesian analysis , 1998, Appl. Math. Comput..
[133] Peter Stone,et al. Gaussian Processes for Sample Efficient Reinforcement Learning with RMAX-like Exploration , 2010, ECML/PKDD.
[134] Sean P. Meyn,et al. An analysis of reinforcement learning with function approximation , 2008, ICML '08.
[135] Finale Doshi-Velez,et al. The Infinite Partially Observable Markov Decision Process , 2009, NIPS.
[136] Andrew Y. Ng,et al. Near-Bayesian exploration in polynomial time , 2009, ICML '09.
[137] François Charpillet,et al. MAA*: A Heuristic Search Algorithm for Solving Decentralized POMDPs , 2005, UAI.
[138] Lihong Li,et al. PAC model-free reinforcement learning , 2006, ICML.
[139] Peter Norvig,et al. Artificial Intelligence: A Modern Approach , 1995 .
[140] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[141] Makoto Yokoo,et al. Networked Distributed POMDPs: A Synergy of Distributed Constraint Optimization and POMDPs , 2005, IJCAI.
[142] David B. Dunson,et al. Approximate Dynamic Programming for Storage Problems , 2011, ICML.
[143] A. Cassandra. A Survey of POMDP Applications , 2003 .
[144] Lawrence Carin,et al. The Infinite Regionalized Policy Representation , 2011, ICML.
[145] H. Kushner,et al. Stochastic Approximation and Recursive Algorithms and Applications , 2003 .
[146] David Hsu,et al. DESPOT: Online POMDP Planning with Regularization , 2013, NIPS.
[147] Feng Qi (祁锋). Bounds for the Ratio of Two Gamma Functions , 2009 .
[148] Shlomo Zilberstein,et al. Optimizing fixed-size stochastic controllers for POMDPs and decentralized POMDPs , 2010, Autonomous Agents and Multi-Agent Systems.
[149] Eric A. Hansen,et al. Solving POMDPs by Searching in Policy Space , 1998, UAI.
[150] Leslie Pack Kaelbling,et al. Spatial and Temporal Abstractions in POMDPs Applied to Robot Navigation , 2005 .
[151] Joelle Pineau,et al. A Bayesian Approach for Learning and Planning in Partially Observable Markov Decision Processes , 2011, J. Mach. Learn. Res..
[152] T. Başar,et al. A New Approach to Linear Filtering and Prediction Problems , 2001 .
[153] Andre Wibisono,et al. Streaming Variational Bayes , 2013, NIPS.
[154] Zoubin Ghahramani,et al. Sparse Gaussian Processes using Pseudo-inputs , 2005, NIPS.
[155] Leslie Pack Kaelbling,et al. Planning with macro-actions in decentralized POMDPs , 2014, AAMAS.