暂无分享,去创建一个
Tamir Hazan | Daniel Tarlow | Nicolas Heess | Chris J. Maddison | Guy Lorberbom | N. Heess | Tamir Hazan | Daniel Tarlow | Guy Lorberbom
[1] Marc Toussaint,et al. On Stochastic Optimal Control and Reinforcement Learning by Approximate Inference , 2012, Robotics: Science and Systems.
[2] Stefano Ermon,et al. Exact Sampling with Integer Linear Programs and Random Perturbations , 2016, AAAI.
[3] Yee Whye Teh,et al. The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.
[4] Simon M. Lucas,et al. A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.
[5] Marc Toussaint,et al. Probabilistic inference for solving discrete and continuous state Markov Decision Processes , 2006, ICML.
[6] Sergey Levine,et al. Path integral guided policy search , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).
[7] Demis Hassabis,et al. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.
[8] Wei Chen,et al. Combinatorial Multi-Armed Bandit: General Framework and Applications , 2013, ICML.
[9] E. Gumbel. Statistical Theory of Extreme Values and Some Practical Applications : A Series of Lectures , 1954 .
[10] Ryan P. Adams,et al. Randomized Optimum Models for Structured Prediction , 2012, AISTATS.
[11] Ira Pohl,et al. Heuristic Search Viewed as Path Finding in a Graph , 1970, Artif. Intell..
[12] Yuval Tassa,et al. Learning Continuous Control Policies by Stochastic Value Gradients , 2015, NIPS.
[13] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.
[14] Max Welling,et al. Stochastic Beams and Where to Find Them: The Gumbel-Top-k Trick for Sampling Sequences Without Replacement , 2019, ICML.
[15] Tommi S. Jaakkola,et al. On the Partition Function and Random Maximum A-Posteriori Perturbations , 2012, ICML.
[16] Yuval Tassa,et al. Maximum a Posteriori Policy Optimisation , 2018, ICLR.
[17] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..
[18] David Duvenaud,et al. Sticking the Landing: Simple, Lower-Variance Gradient Estimators for Variational Inference , 2017, NIPS.
[19] J. Pratt. RISK AVERSION IN THE SMALL AND IN THE LARGE11This research was supported by the National Science Foundation (grant NSF-G24035). Reproduction in whole or in part is permitted for any purpose of the United States Government. , 1964 .
[20] Tommi S. Jaakkola,et al. Direct Optimization through arg max for Discrete Variational Auto-Encoder , 2018, NeurIPS.
[21] Eric A. Hansen,et al. Anytime Heuristic Search , 2011, J. Artif. Intell. Res..
[22] Judea Pearl,et al. Heuristic Search Theory: Survey of Recent Results , 1981, IJCAI.
[23] Nicolas Heess,et al. Woulda, Coulda, Shoulda: Counterfactually-Guided Policy Search , 2018, ICLR.
[24] Tamir Hazan,et al. Direct Loss Minimization for Structured Prediction , 2010, NIPS.
[25] Brendan O'Donoghue,et al. Variational Bayesian Reinforcement Learning with Regret Bounds , 2018, NeurIPS.
[26] Yang Song,et al. Training Deep Neural Networks via Direct Loss Minimization , 2015, ICML.
[27] Sergey Levine,et al. Guided Policy Search via Approximate Mirror Descent , 2016, NIPS.
[28] Ben Poole,et al. Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.
[29] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[30] Geoffrey E. Hinton,et al. A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.
[31] Yee Whye Teh,et al. Particle Value Functions , 2017, ICLR.
[32] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[33] Daniel Hernández-Hernández,et al. Risk Sensitive Markov Decision Processes , 1997 .
[34] George Casella,et al. Implementations of the Monte Carlo EM Algorithm , 2001 .
[35] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..
[36] George Papandreou,et al. Perturb-and-MAP random fields: Using discrete optimization to learn and sample from energy models , 2011, 2011 International Conference on Computer Vision.
[37] Yasemin Altun,et al. Relative Entropy Policy Search , 2010 .
[38] Sergey Levine,et al. Variational Policy Search via Trajectory Optimization , 2013, NIPS.
[39] Nicolò Cesa-Bianchi,et al. Combinatorial Bandits , 2012, COLT.