Continuous-time Value Function Approximation in Reproducing Kernel Hilbert Spaces
暂无分享,去创建一个
Masashi Sugiyama | Masahiro Yukawa | Mikael Johansson | Motoya Ohnishi | Masashi Sugiyama | M. Johansson | Motoya Ohnishi | M. Yukawa
[1] Ingo Steinwart,et al. On the Influence of the Kernel on the Consistency of Support Vector Machines , 2002, J. Mach. Learn. Res..
[2] R. D. Wood,et al. Nonlinear Continuum Mechanics for Finite Element Analysis , 1997 .
[3] Masahiro Yukawa,et al. Multikernel Adaptive Filtering , 2012, IEEE Transactions on Signal Processing.
[4] Paulo Tabuada,et al. Control Barrier Function Based Quadratic Programs with Application to Automotive Safety Systems , 2016, ArXiv.
[5] Xin Xu,et al. Kernel-Based Least Squares Policy Iteration for Reinforcement Learning , 2007, IEEE Transactions on Neural Networks.
[6] Yunpeng Pan,et al. Probabilistic Differential Dynamic Programming , 2014, NIPS.
[7] Paulo Tabuada,et al. Robustness of Control Barrier Functions for Safety Critical Control , 2016, ADHS.
[8] Weifeng Liu,et al. Kernel Adaptive Filtering , 2010 .
[9] Li Wang,et al. Safe Learning of Quadrotor Dynamics Using Barrier Certificates , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).
[10] Li Wang,et al. Safety-aware Adaptive Reinforcement Learning with Applications to Brushbot Navigation , 2018, ArXiv.
[11] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.
[12] H. Minh,et al. Some Properties of Gaussian Reproducing Kernel Hilbert Spaces and Their Implications for Function Approximation and Learning Theory , 2010 .
[13] Paul Honeine,et al. Online Prediction of Time Series Data With Kernels , 2009, IEEE Transactions on Signal Processing.
[14] V. Borkar. Controlled diffusion processes , 2005, math/0511077.
[15] Le Song,et al. A unified kernel framework for nonparametric inference in graphical models ] Kernel Embeddings of Conditional Distributions , 2013 .
[16] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[17] Guy Lever,et al. Modelling transition dynamics in MDPs with RKHS embeddings , 2012, ICML.
[18] Carl E. Rasmussen,et al. Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.
[19] Kenji Fukumizu,et al. Hilbert Space Embeddings of POMDPs , 2012, UAI.
[20] A. Berlinet,et al. Reproducing kernel Hilbert spaces in probability and statistics , 2004 .
[21] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[22] Paulo Tabuada,et al. Control Barrier Function Based Quadratic Programs for Safety Critical Systems , 2016, IEEE Transactions on Automatic Control.
[23] Kenji Doya,et al. Reinforcement Learning in Continuous Time and Space , 2000, Neural Computation.
[24] Alejandro Ribeiro,et al. Parsimonious Online Learning with Kernels via sparse projections in function space , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[25] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .
[26] J. Andrew Bagnell,et al. Online Bellman Residual and Temporal Difference Algorithms with Predictive Error Guarantees , 2016, IJCAI.
[27] Rémi Munos,et al. Reinforcement Learning for Continuous Stochastic Control Problems , 1997, NIPS.
[28] Mi-Ching Tsai,et al. Robust and Optimal Control , 2014 .
[29] F.L. Lewis,et al. Reinforcement learning and adaptive dynamic programming for feedback control , 2009, IEEE Circuits and Systems Magazine.
[30] Javier García,et al. A comprehensive survey on safe reinforcement learning , 2015, J. Mach. Learn. Res..
[31] Masahiro Yukawa,et al. Adaptive Nonlinear Estimation Based on Parallel Projection Along Affine Subspaces in Reproducing Kernel Hilbert Space , 2015, IEEE Transactions on Signal Processing.
[32] Yuval Tassa,et al. DeepMind Control Suite , 2018, ArXiv.
[33] Masashi Sugiyama,et al. Statistical Reinforcement Learning - Modern Machine Learning Approaches , 2015, Chapman and Hall / CRC machine learning and pattern recognition series.
[34] Frank Allgöwer,et al. CONSTRUCTIVE SAFETY USING CONTROL BARRIER FUNCTIONS , 2007 .
[35] S. Shreve,et al. Stochastic differential equations , 1955, Mathematical Proceedings of the Cambridge Philosophical Society.
[36] Thomas B. Schön,et al. Linearly constrained Gaussian processes , 2017, NIPS.
[37] R. Khasminskii. Stochastic Stability of Differential Equations , 1980 .
[38] Stefan Schaal,et al. Reinforcement learning of motor skills in high dimensions: A path integral approach , 2010, 2010 IEEE International Conference on Robotics and Automation.
[39] Aaron D. Ames,et al. Safety Barrier Certificates for Collisions-Free Multirobot Systems , 2017, IEEE Transactions on Robotics.
[40] Li Wang,et al. Barrier-Certified Adaptive Reinforcement Learning With Applications to Brushbot Navigation , 2018, IEEE Transactions on Robotics.
[41] L. C. Baird,et al. Reinforcement learning in continuous time: advantage updating , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).
[42] Gavin Taylor,et al. Kernelized value function approximation for reinforcement learning , 2009, ICML '09.
[43] Koushil Sreenath,et al. Discrete Control Barrier Functions for Safety-Critical Control of Discrete Systems with Application to Bipedal Robot Navigation , 2017, Robotics: Science and Systems.
[44] N. Aronszajn. Theory of Reproducing Kernels. , 1950 .
[45] W. Fleming,et al. Controlled Markov processes and viscosity solutions , 1992 .
[46] Liming Xiang,et al. Kernel-Based Reinforcement Learning , 2006, ICIC.
[47] Magnus Egerstedt,et al. Nonsmooth Barrier Functions With Applications to Multi-Robot Systems , 2017, IEEE Control Systems Letters.
[48] G. Strang. Introduction to Linear Algebra , 1993 .
[49] Peter Stone,et al. Policy Evaluation in Continuous MDPs With Efficient Kernelized Gradient Temporal Difference , 2017, IEEE Transactions on Automatic Control.
[50] Aaron D. Ames,et al. Sufficient conditions for the Lipschitz continuity of QP-based multi-objective control of humanoid robots , 2013, 52nd IEEE Conference on Decision and Control.
[51] I. Yamada,et al. Adaptive Projected Subgradient Method for Asymptotic Minimization of Sequence of Nonnegative Convex Functions , 2005 .
[52] J. Tsitsiklis,et al. Convergence rate of linear two-time-scale stochastic approximation , 2004, math/0405287.
[53] P. Olver. Nonlinear Systems , 2013 .
[54] Ding-Xuan Zhou. Derivative reproducing properties for kernel methods in learning theory , 2008 .
[55] Richard S. Sutton,et al. A Convergent O(n) Temporal-difference Algorithm for Off-policy Learning with Linear Function Approximation , 2008, NIPS.
[56] Daniel Liberzon,et al. Calculus of Variations and Optimal Control Theory: A Concise Introduction , 2012 .
[57] Frank L. Lewis,et al. Online actor critic algorithm to solve the continuous-time infinite horizon optimal control problem , 2009, 2009 International Joint Conference on Neural Networks.
[58] Shie Mannor,et al. Reinforcement learning with Gaussian processes , 2005, ICML.
[59] J. Doyle,et al. Robust and optimal control , 1995, Proceedings of 35th IEEE Conference on Decision and Control.
[60] Yuval Tassa,et al. Stochastic Differential Dynamic Programming , 2010, Proceedings of the 2010 American Control Conference.
[61] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.