On overfitting and asymptotic bias in batch reinforcement learning with partial observability
暂无分享,去创建一个
Damien Ernst | Vincent Francois-Lavet | Raphael Fonteneau | D. Ernst | R. Fonteneau | Vincent François-Lavet
[1] D. Aberdeen,et al. A ( Revised ) Survey of Approximate Methods for Solving Partially Observable Markov Decision Processes , 2003 .
[2] Nan Jiang,et al. Doubly Robust Off-policy Value Evaluation for Reinforcement Learning , 2015, ICML.
[3] Leslie Pack Kaelbling,et al. Acting Optimally in Partially Observable Stochastic Domains , 1994, AAAI.
[4] Nan Jiang,et al. The Dependence of Effective Planning Horizon on Model Accuracy , 2015, AAMAS.
[5] Doina Precup,et al. Metrics for Finite Markov Decision Processes , 2004, AAAI.
[6] Michael I. Jordan,et al. Learning Without State-Estimation in Partially Observable Markovian Decision Processes , 1994, ICML.
[7] D. Braziunas. POMDP solution methods , 2003 .
[8] John N. Tsitsiklis,et al. Bias and Variance Approximation in Value Function Estimates , 2007, Manag. Sci..
[9] Doina Precup,et al. Equivalence Relations in Fully and Partially Observable Markov Decision Processes , 2009, IJCAI.
[10] Balaraman Ravindran. Approximate Homomorphisms : A framework for non-exact minimization in Markov Decision Processes , 2022 .
[11] Peter Stone,et al. Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.
[12] Olivier Buffet,et al. Policy-Gradients for PSRs and POMDPs , 2007, AISTATS.
[13] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[14] Edward J. Sondik,et al. The Optimal Control of Partially Observable Markov Processes over the Infinite Horizon: Discounted Costs , 1978, Oper. Res..
[15] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.
[16] Michael L. Littman,et al. Near Optimal Behavior via Approximate State Abstraction , 2016, ICML.
[17] Michael L. Littman,et al. Memoryless policies: theoretical limitations and practical results , 1994 .
[18] Shie Mannor,et al. Bayesian Reinforcement Learning: A Survey , 2015, Found. Trends Mach. Learn..
[19] Andrew McCallum,et al. Reinforcement learning with selective perception and hidden state , 1996 .
[20] Dana H. Ballard,et al. Active Perception and Reinforcement Learning , 1990, Neural Computation.
[21] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[22] Joelle Pineau,et al. Point-based value iteration: An anytime algorithm for POMDPs , 2003, IJCAI.
[23] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..
[24] Alicia Peregrin. POMDP Homomorphisms , 2007 .
[25] Richard S. Sutton,et al. Predictive Representations of State , 2001, NIPS.
[26] Michael R. James,et al. Predictive State Representations: A New Theory for Modeling Dynamical Systems , 2004, UAI.
[27] Nan Jiang,et al. Abstraction Selection in Model-based Reinforcement Learning , 2015, ICML.
[28] Damien Ernst,et al. Deep Reinforcement Learning Solutions for Energy Microgrids Management , 2016 .
[29] S. Arun-Kumar. On Bisimilarities Induced by Relations on Actions , 2006, Fourth IEEE International Conference on Software Engineering and Formal Methods (SEFM'06).
[30] Tianqi Chen,et al. Net2Net: Accelerating Learning via Knowledge Transfer , 2015, ICLR.
[31] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[32] Doina Precup,et al. Bounding Performance Loss in Approximate MDP Homomorphisms , 2008, NIPS.
[33] Martin A. Riedmiller,et al. Batch Reinforcement Learning , 2012, Reinforcement Learning.
[34] Philip S. Thomas,et al. Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning , 2016, ICML.
[35] Ronald Ortner,et al. Selecting Near-Optimal Approximate State Representations in Reinforcement Learning , 2014, ALT.
[36] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[37] Csaba Szepesvari,et al. Regularization in reinforcement learning , 2011 .
[38] Damien Ernst,et al. How to Discount Deep Reinforcement Learning: Towards New Dynamic Strategies , 2015, ArXiv.
[39] Marek Petrik,et al. Biasing Approximate Dynamic Programming with a Lower Discount Factor , 2008, NIPS.
[40] W. Hoeffding. Probability Inequalities for sums of Bounded Random Variables , 1963 .
[41] Joelle Pineau,et al. A Bayesian Approach for Learning and Planning in Partially Observable Markov Decision Processes , 2011, J. Mach. Learn. Res..
[42] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[43] Marcus Hutter,et al. Extreme State Aggregation beyond MDPs , 2014, ALT.
[44] Rémi Munos,et al. Selecting the State-Representation in Reinforcement Learning , 2011, NIPS.