暂无分享,去创建一个
[1] Kagan Tumer,et al. Analyzing and visualizing multiagent rewards in dynamic and stochastic domains , 2008, Autonomous Agents and Multi-Agent Systems.
[2] Geoffrey E. Hinton,et al. On rectified linear units for speech processing , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[3] Jennie Si,et al. Online learning control by association and reinforcement , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.
[4] Joachim M. Buhmann,et al. Kickback Cuts Backprop's Red-Tape: Biologically Plausible Credit Assignment in Neural Networks , 2014, AAAI.
[5] Jan Peters,et al. Policy evaluation with temporal differences: a survey and comparison , 2015, J. Mach. Learn. Res..
[6] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[7] Michael Fairbank,et al. Value-gradient learning , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).
[8] R. Sutton,et al. A convergent O ( n ) algorithm for off-policy temporal-difference learning with linear function approximation , 2008, NIPS 2008.
[9] DarrellTrevor,et al. End-to-end training of deep visuomotor policies , 2016 .
[10] John Darzentas,et al. Problem Complexity and Method Efficiency in Optimization , 1983 .
[11] Maxim Raginsky,et al. Information-Based Complexity, Feedback and Dynamics in Convex Programming , 2010, IEEE Transactions on Information Theory.
[12] Duy Nguyen-Tuong,et al. Local Gaussian Process Regression for Real Time Online Model Learning , 2008, NIPS.
[13] Razvan Pascanu,et al. Theano: new features and speed improvements , 2012, ArXiv.
[14] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[15] John Langford,et al. Doubly Robust Policy Evaluation and Optimization , 2014, ArXiv.
[16] Yoshua Bengio,et al. Deep Sparse Rectifier Neural Networks , 2011, AISTATS.
[17] Abdeslam Boularias,et al. Gradient Weights help Nonparametric Regressors , 2012, NIPS.
[18] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[19] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[20] Tara N. Sainath,et al. Improving deep neural networks for LVCSR using rectified linear units and dropout , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[21] Martin A. Riedmiller,et al. Reinforcement learning in feedback control , 2011, Machine Learning.
[22] Jan Peters,et al. A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.
[23] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[24] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[25] Kagan Tumer,et al. Combining reward shaping and hierarchies for scaling to large multiagent systems , 2016, The Knowledge Engineering Review.
[26] Kagan Tumer,et al. Unifying temporal and structural credit assignment problems , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..
[27] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[28] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[29] Gregory Shakhnarovich,et al. A Consistent Estimator of the Expected Gradient Outerproduct , 2014, UAI.
[30] Thomas B. Schön,et al. From Pixels to Torques: Policy Learning with Deep Dynamical Models , 2015, ICML 2015.
[31] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[32] David Balduzzi,et al. Deep Online Convex Optimization by Putting Forecaster to Sleep , 2015, ArXiv.
[33] Michail G. Lagoudakis,et al. Coordinated Reinforcement Learning , 2002, ICML.
[34] Stefan Schaal,et al. Policy Gradient Methods for Robotics , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[35] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[36] Peter Szabó,et al. Learning to Control an Octopus Arm with Gaussian Process Temporal Difference Methods , 2005, NIPS.
[37] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[38] Adam Tauman Kalai,et al. Online convex optimization in the bandit setting: gradient descent without a gradient , 2004, SODA '05.
[39] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[40] Michael Fairbank,et al. An Equivalence Between Adaptive Dynamic Programming With a Critic and Backpropagation Through Time , 2013, IEEE Transactions on Neural Networks and Learning Systems.
[41] Geoffrey E. Hinton,et al. Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.
[42] Michael I. Jordan,et al. Learning to Control an Unstable System with Forward Modeling , 1989, NIPS.
[43] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..