暂无分享,去创建一个
[1] Sergey Levine,et al. Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.
[2] Daniel A. Braun,et al. Bounded Rational Decision-Making in Feedforward Neural Networks , 2016, UAI.
[3] Naftali Tishby,et al. Trading Value and Information in MDPs , 2012 .
[4] Jordi Grau-Moya,et al. Soft Q-Learning with Mutual-Information Regularization , 2018, ICLR.
[5] Vicenç Gómez,et al. A unified view of entropy-regularized Markov decision processes , 2017, ArXiv.
[6] Chong Wang,et al. Stochastic variational inference , 2012, J. Mach. Learn. Res..
[7] Dale Schuurmans,et al. Bridging the Gap Between Value and Policy Based Reinforcement Learning , 2017, NIPS.
[8] Daniel Polani,et al. Information Theory of Decisions and Actions , 2011 .
[9] Jordi Grau-Moya,et al. Planning with Information-Processing Constraints and Model Uncertainty in Markov Decision Processes , 2016, ECML/PKDD.
[10] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[11] Thomas M. Cover,et al. Elements of Information Theory , 2005 .
[12] Pieter Abbeel,et al. Equivalence Between Policy Gradients and Soft Q-Learning , 2017, ArXiv.
[13] Sergey Levine,et al. InfoBot: Transfer and Exploration via the Information Bottleneck , 2019, ICLR.
[14] Daniel A. Braun,et al. A Reward-Maximizing Spiking Neuron as a Bounded Rational Decision Maker , 2015, Neural Computation.
[15] Jordi Grau-Moya,et al. Bounded Rationality, Abstraction, and Hierarchical Decision-Making: An Information-Theoretic Optimality Principle , 2015, Front. Robot. AI.
[16] Martha White,et al. Linear Off-Policy Actor-Critic , 2012, ICML.
[17] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[18] Henry Zhu,et al. Soft Actor-Critic Algorithms and Applications , 2018, ArXiv.
[19] Haitham Bou-Ammar,et al. An Information-Theoretic Optimality Principle for Deep Reinforcement Learning , 2017, ArXiv.
[20] Naftali Tishby,et al. The information bottleneck method , 2000, ArXiv.
[21] Daan Wierstra,et al. Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.
[22] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[23] Sergey Levine,et al. Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models , 2018, NeurIPS.
[24] Sergey Levine,et al. Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review , 2018, ArXiv.
[25] C. Sims. Implications of rational inattention , 2003 .
[26] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[27] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[28] Yuval Tassa,et al. Maximum a Posteriori Policy Optimisation , 2018, ICLR.
[29] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[30] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.
[31] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[32] Vicenç Gómez,et al. Dynamic Policy Programming with Function Approximation , 2011, AISTATS.
[33] Daniel A. Braun,et al. An Information-theoretic On-line Learning Principle for Specialization in Hierarchical Decision-Making Systems , 2019, 2019 IEEE 58th Conference on Decision and Control (CDC).
[34] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.
[35] Roy Fox,et al. Taming the Noise in Reinforcement Learning via Soft Updates , 2015, UAI.
[36] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[37] J. Andrew Bagnell,et al. Modeling Purposeful Adaptive Behavior with the Principle of Maximum Causal Entropy , 2010 .
[38] Michael I. Jordan,et al. RLlib: Abstractions for Distributed Reinforcement Learning , 2017, ICML.