Reinforcement Learning of POMDPs using Spectral Methods
暂无分享,去创建一个
Kamyar Azizzadenesheli | Anima Anandkumar | Alessandro Lazaric | A. Lazaric | K. Azizzadenesheli | Anima Anandkumar
[1] Joelle Pineau,et al. Efficient learning and planning with compressed predictive states , 2013, J. Mach. Learn. Res..
[2] Richard S. Sutton,et al. Predictive Representations of State , 2001, NIPS.
[3] Satinder P. Singh,et al. Experimental Results on Learning Stochastic Memoryless Policies for Partially Observable Markov Decision Processes , 1998, NIPS.
[4] Michael I. Jordan,et al. Learning Without State-Estimation in Partially Observable Markovian Decision Processes , 1994, ICML.
[5] Emma Brunskill,et al. A PAC RL Algorithm for Episodic POMDPs , 2016, AISTATS.
[6] Alessandro Lazaric,et al. Regret Bounds for Reinforcement Learning with Policy Advice , 2013, ECML/PKDD.
[7] John N. Tsitsiklis,et al. The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..
[8] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[9] Joelle Pineau,et al. Building Adaptive Dialogue Systems Via Bayes-Adaptive POMDPs , 2012, IEEE Journal of Selected Topics in Signal Processing.
[10] Michael L. Littman,et al. Memoryless policies: theoretical limitations and practical results , 1994 .
[11] Pascal Poupart,et al. Model-based Bayesian Reinforcement Learning in Partially Observable Domains , 2008, ISAIM.
[12] Ambuj Tewari,et al. REGAL: A Regularization based Algorithm for Reinforcement Learning in Weakly Communicating MDPs , 2009, UAI.
[13] Le Song,et al. Nonparametric Estimation of Multi-View Latent Variable Models , 2013, ICML.
[14] Craig Boutilier,et al. Bounded Finite State Controllers , 2003, NIPS.
[15] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[16] Joelle Pineau,et al. Bayes-Adaptive POMDPs , 2007, NIPS.
[17] Alessandro Lazaric,et al. Sequential Transfer in Multi-armed Bandit with Finite Set of Models , 2013, NIPS.
[18] John Langford,et al. Contextual-MDPs for PAC-Reinforcement Learning with Rich Observations , 2016, ArXiv.
[19] Csaba Szepesvári,et al. Regret Bounds for the Adaptive Control of Linear Quadratic Systems , 2011, COLT.
[20] Anima Anandkumar,et al. A Method of Moments for Mixture Models and Hidden Markov Models , 2012, COLT.
[21] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[22] Wei Chen,et al. Combinatorial Multi-Armed Bandit: General Framework and Applications , 2013, ICML.
[23] Eric Deeson,et al. Online learning , 2005, Br. J. Educ. Technol..
[24] Milos Hauskrecht,et al. Planning treatment of ischemic heart disease with partially observable Markov decision processes , 2000, Artif. Intell. Medicine.
[25] Joelle Pineau,et al. Efficient Planning and Tracking in POMDPs with Large Observation Spaces , 2006 .
[26] Alessandro Lazaric,et al. Stochastic Optimization of a Locally Smooth Function under Correlated Bandit Feedback , 2014, ArXiv.
[27] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .
[28] Omid Madani. On the Computability of Infinite-Horizon Partially Observable Markov Decision Processes , 2007 .
[29] Joel A. Tropp,et al. User-Friendly Tail Bounds for Sums of Random Matrices , 2010, Found. Comput. Math..
[30] L. Tong,et al. Online Learning and Optimization of Markov Jump Affine Models , 2016, ArXiv.
[31] John Loch,et al. Using Eligibility Traces to Find the Best Memoryless Policy in Partially Observable Markov Decision Processes , 1998, ICML.
[32] Pascal Poupart,et al. Partially Observable Markov Decision Processes , 2010, Encyclopedia of Machine Learning.
[33] Csaba Szepesvári,et al. Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.
[34] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[35] Peter Auer,et al. Logarithmic Online Regret Bounds for Undiscounted Reinforcement Learning , 2006, NIPS.
[36] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.
[37] Aryeh Kontorovich,et al. On learning parametric-output HMMs , 2013, ICML.
[38] Aryeh Kontorovich,et al. Uniform Chernoff and Dvoretzky-Kiefer-Wolfowitz-Type Inequalities for Markov Chains and Related Processes , 2012, J. Appl. Probab..
[39] Ronald Ortner,et al. Selecting Near-Optimal Approximate State Representations in Reinforcement Learning , 2014, ALT.
[40] Theodore J. Perkins,et al. Reinforcement learning for POMDPs based on action values and stochastic optimization , 2002, AAAI/IAAI.
[41] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[42] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.
[43] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[44] L. Meng,et al. The optimal perturbation bounds of the Moore–Penrose inverse under the Frobenius norm , 2010 .
[45] Michael I. Jordan,et al. PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.
[46] Eduardo F. Morales,et al. An Introduction to Reinforcement Learning , 2011 .
[47] Anima Anandkumar,et al. Tensor decompositions for learning latent variable models , 2012, J. Mach. Learn. Res..
[48] K. Ramanan,et al. Concentration Inequalities for Dependent Random Variables via the Martingale Method , 2006, math/0609835.
[49] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[50] Hongsheng Xi,et al. Finding optimal memoryless policies of POMDPs under the expected average reward criterion , 2011, Eur. J. Oper. Res..
[51] Csaba Szepesvári,et al. Mixing Time Estimation in Reversible Markov Chains from a Single Sample Path , 2015, NIPS.
[52] Edward J. Sondik,et al. The optimal control of par-tially observable Markov processes , 1971 .
[53] Byron Boots,et al. Closing the learning-planning loop with predictive state representations , 2009, Int. J. Robotics Res..
[54] Michael I. Jordan,et al. Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.
[55] Jeff G. Schneider,et al. Policy Search by Dynamic Programming , 2003, NIPS.