Sparse Proximal Reinforcement Learning via Nested Optimization
暂无分享,去创建一个
Kotaro Hirasawa | Qibing Jin | Dazi Li | Tianheng Song | Q. Jin | Dazi Li | Tianheng Song | Kotaro Hirasawa
[1] Huaguang Zhang,et al. Fault-Tolerant Controller Design for a Class of Nonlinear MIMO Discrete-Time Systems via Online Reinforcement Learning Algorithm , 2016, IEEE Transactions on Systems, Man, and Cybernetics: Systems.
[2] Dazi Li,et al. Online ℓ2-regularized reinforcement learning via RBF neural network , 2016, 2016 Chinese Control and Decision Conference (CCDC).
[3] Shalabh Bhatnagar,et al. Toward Off-Policy Learning Control with Function Approximation , 2010, ICML.
[4] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[5] Bo Liu,et al. Proximal Reinforcement Learning: A New Theory of Sequential Decision Making in Primal-Dual Spaces , 2014, ArXiv.
[6] Dewen Hu,et al. Multiobjective Reinforcement Learning: A Comprehensive Overview , 2015, IEEE Transactions on Systems, Man, and Cybernetics: Systems.
[7] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[8] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[9] Huizhen Yu,et al. Least Squares Temporal Difference Methods: An Analysis under General Conditions , 2012, SIAM J. Control. Optim..
[10] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.
[11] 丸山 徹. Convex Analysisの二,三の進展について , 1977 .
[12] Zhiwei Qin,et al. Sparse Reinforcement Learning via Convex Optimization , 2014, ICML.
[13] H. Zou,et al. Regularization and variable selection via the elastic net , 2005 .
[14] Matthieu Geist,et al. Off-policy learning with eligibility traces: a survey , 2013, J. Mach. Learn. Res..
[15] Martin A. Riedmiller. Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.
[16] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[17] R. Sutton,et al. A convergent O ( n ) algorithm for off-policy temporal-difference learning with linear function approximation , 2008, NIPS 2008.
[18] Qibing Jin,et al. Regularization and feature selection in least squares temporal difference with gradient correction , 2016, 2016 12th World Congress on Intelligent Control and Automation (WCICA).
[19] W. Marsden. I and J , 2012 .
[20] Ding Wang,et al. Feature selection and feature learning for high-dimensional batch reinforcement learning: A survey , 2015, International Journal of Automation and Computing.
[21] Shie Mannor,et al. Regularized Policy Iteration , 2008, NIPS.
[22] Stephen P. Boyd,et al. Proximal Algorithms , 2013, Found. Trends Optim..
[23] Bo Liu,et al. Sparse Q-learning with Mirror Descent , 2012, UAI.
[24] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.
[25] J. H. Wilkinson,et al. Note on the iterative refinement of least squares solution , 1966 .
[26] Yang Li,et al. Adaptive Neural Network Control of AUVs With Control Input Nonlinearities Using Reinforcement Learning , 2017, IEEE Transactions on Systems, Man, and Cybernetics: Systems.
[27] Yujing Hu,et al. Accelerating Multiagent Reinforcement Learning by Equilibrium Transfer , 2015, IEEE Transactions on Cybernetics.
[28] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[29] Andrew Y. Ng,et al. Regularization and feature selection in least-squares temporal difference learning , 2009, ICML '09.
[30] H. He,et al. Efficient Reinforcement Learning Using Recursive Least-Squares Methods , 2011, J. Artif. Intell. Res..
[31] Stephen P. Boyd,et al. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..
[32] Reinaldo A. C. Bianchi,et al. Heuristically-Accelerated Multiagent Reinforcement Learning , 2014, IEEE Transactions on Cybernetics.
[33] Derong Liu,et al. Policy Iteration Adaptive Dynamic Programming Algorithm for Discrete-Time Nonlinear Systems , 2014, IEEE Transactions on Neural Networks and Learning Systems.
[34] Bo Liu,et al. Regularized Off-Policy TD-Learning , 2012, NIPS.
[35] Csaba Szepesvári,et al. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path , 2006, Machine Learning.
[36] Shalabh Bhatnagar,et al. Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation , 2009, NIPS.
[37] Matthew W. Hoffman,et al. Regularized Least Squares Temporal Difference Learning with Nested ℓ2 and ℓ1 Penalization , 2011, EWRL.
[38] Kotaro Hirasawa,et al. Kernel-Based Least Squares Temporal Difference With Gradient Correction , 2016, IEEE Transactions on Neural Networks and Learning Systems.
[39] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.
[40] Xin Xu,et al. Kernel-Based Least Squares Policy Iteration for Reinforcement Learning , 2007, IEEE Transactions on Neural Networks.
[41] Sean P. Meyn,et al. The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning , 2000, SIAM J. Control. Optim..
[42] Ronald Parr,et al. Linear Complementarity for Regularized Policy Evaluation and Improvement , 2010, NIPS.
[43] M. Loth,et al. Sparse Temporal Difference Learning Using LASSO , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.