暂无分享,去创建一个
Sergey Levine | Natasha Jaques | Stuart Russell | Andrew Critch | Alexandre Bayen | Eugene Vinitsky | Michael Dennis
[1] Greg Turk,et al. Preparing for the Unknown: Learning a Universal Policy with Online System Identification , 2017, Robotics: Science and Systems.
[2] Kris M. Kitani,et al. VADRA: Visual Adversarial Domain Randomization and Augmentation , 2018, ArXiv.
[3] Jakub W. Pachocki,et al. Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..
[4] H. Robbins. Some aspects of the sequential design of experiments , 1952 .
[5] Mi-Ching Tsai,et al. Robust and Optimal Control , 2014 .
[6] Julian Togelius,et al. Rotation, Translation, and Cropping for Zero-Shot Generalization , 2020, 2020 IEEE Conference on Games (CoG).
[7] Joel Z. Leibo,et al. Autocurricula and the Emergence of Innovation from Social Interaction: A Manifesto for Multi-Agent Intelligence Research , 2019, ArXiv.
[8] Laurent El Ghaoui,et al. Robust Control of Markov Decision Processes with Uncertain Transition Matrices , 2005, Oper. Res..
[9] Lakmal Seneviratne,et al. Adaptive Control Of Robot Manipulators , 1992, Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems.
[10] Sergey Levine,et al. Unsupervised Meta-Learning for Reinforcement Learning , 2018, ArXiv.
[11] Marek Petrik,et al. Safe Policy Improvement by Minimizing Robust Baseline Regret , 2016, NIPS.
[12] James Davidson,et al. Supervision via competition: Robot adversaries for learning tasks , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).
[13] Karl Johan Åström,et al. Theory and applications of adaptive control - A survey , 1983, Autom..
[14] Jun Morimoto,et al. Robust Reinforcement Learning , 2005, Neural Computation.
[15] Sergey Levine,et al. Diversity is All You Need: Learning Skills without a Reward Function , 2018, ICLR.
[16] Atil Iscen,et al. Sim-to-Real: Learning Agile Locomotion For Quadruped Robots , 2018, Robotics: Science and Systems.
[17] Danica Kragic,et al. Reinforcement Learning for Pivoting Task , 2017, ArXiv.
[18] Sébastien Bubeck,et al. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..
[19] Abraham Wald,et al. Statistical Decision Functions , 1951 .
[20] Christopher Joseph Pal,et al. Active Domain Randomization , 2019, CoRL.
[21] Martin Peterson,et al. An Introduction to Decision Theory , 2009 .
[22] Wojciech Zaremba,et al. Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
[23] Jürgen Schmidhuber,et al. Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990–2010) , 2010, IEEE Transactions on Autonomous Mental Development.
[24] Christos H. Papadimitriou,et al. Games against nature , 1985, 24th Annual Symposium on Foundations of Computer Science (sfcs 1983).
[25] Andrew Y. Ng,et al. Solving Uncertain Markov Decision Processes , 2001 .
[26] Alex Graves,et al. Automated Curriculum Learning for Neural Networks , 2017, ICML.
[27] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[28] John Schulman,et al. Teacher–Student Curriculum Learning , 2017, IEEE Transactions on Neural Networks and Learning Systems.
[29] Igor Mordatch,et al. Emergent Tool Use From Multi-Agent Autocurricula , 2019, ICLR.
[30] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[31] Joshua B. Tenenbaum,et al. Learning with AMIGo: Adversarially Motivated Intrinsic Goals , 2020, ICLR.
[32] Sergey Levine,et al. Time-Contrastive Networks: Self-Supervised Learning from Video , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).
[33] Dima Damen,et al. Egocentric Real-time Workspace Monitoring using an RGB-D camera , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[34] Garud Iyengar,et al. Robust Dynamic Programming , 2005, Math. Oper. Res..
[35] Michael A. Osborne,et al. The future of employment: How susceptible are jobs to computerisation? , 2017 .
[36] Joel Lehman,et al. Enhanced POET: Open-Ended Reinforcement Learning through Unbounded Invention of Learning Challenges and their Solutions , 2020, ICML.
[37] Sergey Levine,et al. Adversarial Policies: Attacking Deep Reinforcement Learning , 2019, ICLR.
[38] Craig Boutilier,et al. Robust Online Optimization of Reward-Uncertain MDPs , 2011, IJCAI.
[39] Sergey Levine,et al. (CAD)$^2$RL: Real Single-Image Flight without a Single Real Image , 2016, Robotics: Science and Systems.
[40] Shie Mannor,et al. Scaling Up Robust MDPs using Function Approximation , 2014, ICML.
[41] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .
[42] S. Shankar Sastry,et al. On Gradient-Based Learning in Continuous Games , 2018, SIAM J. Math. Data Sci..
[43] S. Chaiklin. The zone of proximal development in Vygotsky's analysis of learning and instruction. , 2003 .
[44] Craig Boutilier,et al. Regret-based Reward Elicitation for Markov Decision Processes , 2009, UAI.
[45] Leonard J. Savage,et al. The Theory of Statistical Decision , 1951 .
[46] Rui Wang,et al. Paired Open-Ended Trailblazer (POET): Endlessly Generating Increasingly Complex and Diverse Learning Environments and Their Solutions , 2019, ArXiv.
[47] Nick Jakobi,et al. Evolutionary Robotics and the Radical Envelope-of-Noise Hypothesis , 1997, Adapt. Behav..
[48] Ilya Kostrikov,et al. Intrinsic Motivation and Automatic Curricula via Asymmetric Self-Play , 2017, ICLR.