Dynamic regret convergence analysis and an adaptive regularization algorithm for on-policy robot imitation learning
暂无分享,去创建一个
Ken Goldberg | Ajay Kumar Tanwani | Michael Laskey | Anil Aswani | Jonathan Lee | Michael Laskey | Ken Goldberg | A. Tanwani | A. Aswani | Jonathan Lee
[1] Byron Boots,et al. Online Learning with Continuous Variations: Dynamic Regret and Reductions , 2020, AISTATS.
[2] Michael Laskey,et al. Stability Analysis of On-Policy Imitation Learning Algorithms Using Dynamic Regret , 2018 .
[3] S. Banach. Sur les opérations dans les ensembles abstraits et leur application aux équations intégrales , 1922 .
[4] Martin Zinkevich,et al. Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.
[5] Pieter Abbeel,et al. An Algorithmic Perspective on Imitation Learning , 2018, Found. Trends Robotics.
[6] Byron Boots,et al. Agile Off-Road Autonomous Driving Using End-to-End Deep Imitation Learning , 2017, ArXiv.
[7] Aryan Mokhtari,et al. Optimization in Dynamic Environments : Improved Regret Rates for Strongly Convex Problems , 2016 .
[8] Byron Boots,et al. Deeply AggreVaTeD: Differentiable Imitation Learning for Sequential Prediction , 2017, ICML.
[9] Nolan Wagener,et al. Fast Policy Learning through Imitation and Reinforcement , 2018, UAI.
[10] Karthik Sridharan,et al. Optimization, Learning, and Games with Predictable Sequences , 2013, NIPS.
[11] D. Bertsekas. Convergence of discretization procedures in dynamic programming , 1975 .
[12] Sham M. Kakade,et al. Mind the Duality Gap: Logarithmic regret algorithms for online optimization , 2008, NIPS.
[13] Byron Boots,et al. Accelerating Imitation Learning with Predictive Models , 2018, AISTATS.
[14] Yisong Yue,et al. Smooth Imitation Learning for Online Sequence Prediction , 2016, ICML.
[15] Michael David Laskey,et al. On and Off-Policy Deep Imitation Learning for Robotics , 2018 .
[16] Felix Duvallet,et al. Imitation learning for natural language direction following through unknown environments , 2013, 2013 IEEE International Conference on Robotics and Automation.
[17] Wouter M. Koolen,et al. A Closer Look at Adaptive Regret , 2012, J. Mach. Learn. Res..
[18] Marcin Andrychowicz,et al. One-Shot Imitation Learning , 2017, NIPS.
[19] Elad Hazan,et al. Introduction to Online Convex Optimization , 2016, Found. Trends Optim..
[20] Shahin Shahrampour,et al. Online Optimization : Competing with Dynamic Comparators , 2015, AISTATS.
[21] Martial Hebert,et al. Learning monocular reactive UAV control in cluttered natural environments , 2012, 2013 IEEE International Conference on Robotics and Automation.
[22] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[23] Siddhartha Srinivasa,et al. Imitation Learning as f-Divergence Minimization , 2019, WAFR.
[24] Byron Boots,et al. Agile Autonomous Driving using End-to-End Deep Imitation Learning , 2017, Robotics: Science and Systems.
[25] Kyunghyun Cho,et al. Query-Efficient Imitation Learning for End-to-End Simulated Driving , 2017, AAAI.
[26] Ajay Kumar Tanwani,et al. A Dynamic Regret Analysis and Adaptive Regularization Algorithm for On-Policy Robot Imitation Learning , 2018, WAFR.
[27] Mohamed Medhat Gaber,et al. Deep imitation learning for 3D navigation tasks , 2017, Neural Computing and Applications.
[28] J A Bagnell,et al. An Invitation to Imitation , 2015 .
[29] S. Sastry. Nonlinear Systems: Analysis, Stability, and Control , 1999 .
[30] Jinfeng Yi,et al. Tracking Slowly Moving Clairvoyant: Optimal Dynamic Regret of Online Learning with True and Noisy Gradient , 2016, ICML.
[31] Dean Pomerleau,et al. ALVINN, an autonomous land vehicle in a neural network , 2015 .
[32] Sébastien Bubeck,et al. Introduction to Online Optimization , 2011 .
[33] M. Fukushima. Merit Functions for Variational Inequality and Complementarity Problems , 1996 .
[34] Rebecca Willett,et al. Online Convex Optimization in Dynamic Environments , 2015, IEEE Journal of Selected Topics in Signal Processing.
[35] F. Facchinei,et al. Finite-Dimensional Variational Inequalities and Complementarity Problems , 2003 .
[36] Ken Goldberg,et al. Deep Imitation Learning for Complex Manipulation Tasks from Virtual Reality Teleoperation , 2017, ICRA.
[37] Karthik Sridharan,et al. Statistical Learning and Sequential Prediction , 2014 .
[38] Seshadhri Comandur,et al. Electronic Colloquium on Computational Complexity, Report No. 88 (2007) Adaptive Algorithms for Online Decision Problems , 2022 .
[39] Elad Hazan,et al. Logarithmic regret algorithms for online convex optimization , 2006, Machine Learning.
[40] Byron Boots,et al. Convergence of Value Aggregation for Imitation Learning , 2018, AISTATS.
[41] P. Olver. Nonlinear Systems , 2013 .
[42] Anca D. Dragan,et al. Comparing human-centric and robot-centric sampling for robot deep learning from demonstrations , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).
[43] Elad Hazan,et al. An optimal algorithm for stochastic strongly-convex optimization , 2010, 1006.2425.
[44] Luca Bascetta,et al. Policy gradient in Lipschitz Markov Decision Processes , 2015, Machine Learning.
[45] Jinfeng Yi,et al. Improved Dynamic Regret for Non-degenerate Functions , 2016, NIPS.
[46] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[47] Kavosh Asadi,et al. Lipschitz Continuity in Model-based Reinforcement Learning , 2018, ICML.
[48] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.
[49] Karl Hinderer,et al. Lipschitz Continuity of Value Functions in Markovian Decision Processes , 2005, Math. Methods Oper. Res..