暂无分享,去创建一个
[1] J. Andrew Bagnell,et al. Modeling Purposeful Adaptive Behavior with the Principle of Maximum Causal Entropy , 2010 .
[2] C. Villani. Optimal Transport: Old and New , 2008 .
[3] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[4] Amir Dembo,et al. Large Deviations Techniques and Applications , 1998 .
[5] Pieter Abbeel,et al. Equivalence Between Policy Gradients and Soft Q-Learning , 2017, ArXiv.
[6] Koray Kavukcuoglu,et al. PGQ: Combining policy gradient and Q-learning , 2016, ArXiv.
[7] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[8] Dale Schuurmans,et al. Bridging the Gap Between Value and Policy Based Reinforcement Learning , 2017, NIPS.
[9] F. Opitz. Information geometry and its applications , 2012, 2012 9th European Radar Conference.
[10] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.
[11] Alexander J. Smola,et al. Unifying Divergence Minimization and Statistical Inference Via Convex Duality , 2006, COLT.
[12] Roy Fox,et al. Taming the Noise in Reinforcement Learning via Soft Updates , 2015, UAI.
[13] Vicenç Gómez,et al. A unified view of entropy-regularized Markov decision processes , 2017, ArXiv.
[14] Stefano Soatto,et al. Deep relaxation: partial differential equations for optimizing deep neural networks , 2017, Research in the Mathematical Sciences.
[15] Heinz H. Bauschke,et al. Convex Analysis and Monotone Operator Theory in Hilbert Spaces , 2011, CMS Books in Mathematics.
[16] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[17] Sergey Levine,et al. Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.
[18] Dale Schuurmans,et al. Trust-PCL: An Off-Policy Trust Region Method for Continuous Control , 2017, ICLR.