暂无分享,去创建一个
[1] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.
[2] Yuval Tassa,et al. Safe Exploration in Continuous Action Spaces , 2018, ArXiv.
[3] Marc Teboulle,et al. Mirror descent and nonlinear projected subgradient methods for convex optimization , 2003, Oper. Res. Lett..
[4] Byron Boots,et al. Accelerating Imitation Learning with Predictive Models , 2018, AISTATS.
[5] C. Cordell Green,et al. What Is Program Synthesis? , 1985, J. Autom. Reason..
[6] Andreas Krause,et al. Safe Model-based Reinforcement Learning with Stability Guarantees , 2017, NIPS.
[7] Abhinav Verma,et al. Programmatically Interpretable Reinforcement Learning , 2018, ICML.
[8] Armando Solar-Lezama,et al. Learning to Infer Graphics Programs from Hand-Drawn Images , 2017, NeurIPS.
[9] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[10] Swarat Chaudhuri,et al. HOUDINI: Lifelong Learning as Program Synthesis , 2018, NeurIPS.
[11] Swarat Chaudhuri,et al. Bridging boolean and quantitative synthesis using smoothed proof search , 2014, POPL.
[12] Yisong Yue,et al. Smooth Imitation Learning for Online Sequence Prediction , 2016, ICML.
[13] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[14] Byron Boots,et al. Dual Policy Iteration , 2018, NeurIPS.
[15] Rajeev Alur,et al. Syntax-guided synthesis , 2013, 2013 Formal Methods in Computer-Aided Design.
[16] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[17] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .
[18] Swarat Chaudhuri,et al. Neural Sketch Learning for Conditional Program Generation , 2017, ICLR.
[19] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[20] Heinz H. Bauschke,et al. Convex Analysis and Monotone Operator Theory in Hilbert Spaces , 2011, CMS Books in Mathematics.
[21] Adam Tauman Kalai,et al. Online convex optimization in the bandit setting: gradient descent without a gradient , 2004, SODA '05.
[22] Nolan Wagener,et al. Fast Policy Learning through Imitation and Reinforcement , 2018, UAI.
[23] Isil Dillig,et al. Synthesizing data structure transformations from input-output examples , 2015, PLDI.
[24] Philip Bachman,et al. Deep Reinforcement Learning that Matters , 2017, AAAI.
[25] Sanjit A. Seshia,et al. Combinatorial sketching for finite programs , 2006, ASPLOS XII.
[26] Arjun Radhakrishna,et al. Scaling Enumerative Program Synthesis via Divide and Conquer , 2017, TACAS.
[27] Ufuk Topcu,et al. Safe Reinforcement Learning via Shielding , 2017, AAAI.
[28] Swarat Chaudhuri,et al. Control Regularization for Reduced Variance Reinforcement Learning , 2019, ICML.
[29] Sumit Gulwani,et al. FlashMeta: a framework for inductive program synthesis , 2015, OOPSLA.
[30] Armando Solar-Lezama,et al. Verifiable Reinforcement Learning via Policy Extraction , 2018, NeurIPS.
[31] John Darzentas,et al. Problem Complexity and Method Efficiency in Optimization , 1983 .
[32] Sergey Levine,et al. Guided Policy Search via Approximate Mirror Descent , 2016, NIPS.
[33] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[34] Krishnendu Chatterjee,et al. Better Quality in Synthesis through Quantitative Objectives , 2009, CAV.
[35] John Langford,et al. Learning to Search Better than Your Teacher , 2015, ICML.
[36] Byron Boots,et al. Truncated Horizon Policy Search: Combining Reinforcement Learning & Imitation Learning , 2018, ICLR.
[37] Tore Hägglund,et al. Automatic tuning of simple regulators with specifications on phase and amplitude margins , 1984, Autom..
[38] Byron Boots,et al. Predictor-Corrector Policy Optimization , 2018, ICML.
[39] Pieter Abbeel,et al. Constrained Policy Optimization , 2017, ICML.
[40] R. Bellman,et al. On the “bang-bang” control problem , 1956 .
[41] Andreas Krause,et al. Learning programs from noisy data , 2016, POPL.
[42] Pieter Abbeel,et al. Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.
[43] J. Andrew Bagnell,et al. Reinforcement and Imitation Learning via Interactive No-Regret Learning , 2014, ArXiv.
[44] Yun Li,et al. PID control system analysis, design, and technology , 2005, IEEE Transactions on Control Systems Technology.
[45] Yisong Yue,et al. Batch Policy Learning under Constraints , 2019, ICML.