暂无分享,去创建一个
André da Motta Salles Barreto | Doina Precup | Mohammad Ghavamzadeh | Amir-massoud Farahmand | Doina Precup | André Barreto | M. Ghavamzadeh | Amir-massoud Farahmand
[1] M. Drouillon,et al. A. M. A. , 2019, California state journal of medicine.
[2] A. P. Wieland,et al. Evolving neural network controllers for unstable systems , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.
[3] Gerald Tesauro,et al. On-line Policy Improvement using Monte-Carlo Search , 1996, NIPS.
[4] László Györfi,et al. A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.
[5] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[6] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[7] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[8] Peter L. Bartlett,et al. Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..
[9] John N. Tsitsiklis,et al. Simulation-based optimization of Markov reward processes , 2001, IEEE Trans. Autom. Control..
[10] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[11] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[12] Adam Krzyzak,et al. A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.
[13] Robert Givan,et al. Approximate Policy Iteration with a Policy Language Bias , 2003, NIPS.
[14] Stefan Schaal,et al. Reinforcement Learning for Humanoid Robotics , 2003 .
[15] Rémi Munos,et al. Error Bounds for Approximate Policy Iteration , 2003, ICML.
[16] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[17] Jeff G. Schneider,et al. Policy Search by Dynamic Programming , 2003, NIPS.
[18] Michail G. Lagoudakis,et al. Reinforcement Learning as Classification: Leveraging Modern Classifiers , 2003, ICML.
[19] Shie Mannor,et al. The kernel recursive least-squares algorithm , 2004, IEEE Transactions on Signal Processing.
[20] B. Adams,et al. Dynamic multidrug therapies for hiv: optimal and sti control approaches. , 2004, Mathematical biosciences and engineering : MBE.
[21] Philip D. Plowright,et al. Convexity , 2019, Optimization for Chemical and Biochemical Engineering.
[22] Xi-Ren Cao,et al. A basic formula for online policy gradient algorithms , 2005, IEEE Transactions on Automatic Control.
[23] Martin A. Riedmiller. Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.
[24] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..
[25] Dimitri P. Bertsekas,et al. Dynamic Programming and Suboptimal Control: A Survey from ADP to MPC , 2005, Eur. J. Control.
[26] P. Bartlett,et al. Local Rademacher complexities , 2005, math/0508275.
[27] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[28] S. Boucheron,et al. Theory of classification : a survey of some recent advances , 2005 .
[29] Pierre Geurts,et al. Extremely randomized trees , 2006, Machine Learning.
[30] Michael I. Jordan,et al. Convexity, Classification, and Risk Bounds , 2006 .
[31] Louis Wehenkel,et al. Clinical data based optimal STI strategies for HIV: a reinforcement learning approach , 2006, Proceedings of the 45th IEEE Conference on Decision and Control.
[32] Mohammad Ghavamzadeh,et al. Bayesian Policy Gradient Algorithms , 2006, NIPS.
[33] Daniel Polani,et al. Least Squares SVM for Least Squares TD Learning , 2006, ECAI.
[34] Stergios B. Fotopoulos,et al. All of Nonparametric Statistics , 2007, Technometrics.
[35] Csaba Szepesvári,et al. Fitted Q-iteration in continuous action-space MDPs , 2007, NIPS.
[36] Léon Bottou,et al. The Tradeoffs of Large Scale Learning , 2007, NIPS.
[37] Csaba Szepesvári,et al. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path , 2006, Machine Learning.
[38] A. Tsybakov,et al. Fast learning rates for plug-in classifiers , 2007, 0708.2321.
[39] Vadim Bulitko,et al. Focus of Attention in Reinforcement Learning , 2007, J. Univers. Comput. Sci..
[40] Lihong Li,et al. Analyzing feature generation for value-function approximation , 2007, ICML '07.
[41] Rémi Munos,et al. Performance Bounds in Lp-norm for Approximate Value Iteration , 2007, SIAM J. Control. Optim..
[42] Sridhar Mahadevan,et al. Proto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes , 2007, J. Mach. Learn. Res..
[43] Andreas Christmann,et al. Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.
[44] Christos Dimitrakakis,et al. Algorithms and Bounds for Rollout Sampling Approximate Policy Iteration , 2008, EWRL.
[45] Shie Mannor,et al. Regularized Policy Iteration , 2008, NIPS.
[46] Shalabh Bhatnagar,et al. Natural actor-critic algorithms , 2009, Autom..
[47] Gavin Taylor,et al. Kernelized value function approximation for reinforcement learning , 2009, ICML '09.
[48] Shie Mannor,et al. Regularized Fitted Q-Iteration for planning in continuous-space Markovian decision problems , 2009, 2009 American Control Conference.
[49] Andrew Y. Ng,et al. Regularization and feature selection in least-squares temporal difference learning , 2009, ICML '09.
[50] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[51] Alessandro Lazaric,et al. Analysis of a Classification-based Policy Iteration Algorithm , 2010, ICML.
[52] Csaba Szepesvári,et al. Model Selection in Reinforcement Learning , 2011, Machine Learning.
[53] Ronald Parr,et al. Linear Complementarity for Regularized Policy Evaluation and Improvement , 2010, NIPS.
[54] Bart De Schutter,et al. Approximate dynamic programming with a fuzzy parameterization , 2010, Autom..
[55] Csaba Szepesvári,et al. Error Propagation for Approximate Policy and Value Iteration , 2010, NIPS.
[56] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.
[57] Matthew W. Hoffman,et al. Finite-Sample Analysis of Lasso-TD , 2011, ICML.
[58] Matthieu Geist,et al. Parametric value function approximation: A unified view , 2011, 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).
[59] Matthew W. Hoffman,et al. Regularized Least Squares Temporal Difference Learning with Nested ℓ2 and ℓ1 Penalization , 2011, EWRL.
[60] Matthieu Geist,et al. ℓ1-Penalized Projected Bellman Residual , 2011, EWRL.
[61] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.
[62] Csaba Szepesvari,et al. Regularization in reinforcement learning , 2011 .
[63] Bruno Scherrer,et al. Classification-based Policy Iteration with a Critic , 2011, ICML.
[64] Amir Massoud Farahmand,et al. Action-Gap Phenomenon in Reinforcement Learning , 2011, NIPS.
[65] Alborz Geramifard,et al. Online Discovery of Feature Dependencies , 2011, ICML.
[66] Matthieu Geist,et al. Approximate Modified Policy Iteration , 2012, ICML.
[67] Alessandro Lazaric,et al. Conservative and Greedy Approaches to Classification-Based Policy Iteration , 2012, AAAI.
[68] Doina Precup,et al. Generalized Classication-bas ed Approximate Policy Iteration , 2012 .
[69] Csaba Szepesvári,et al. Statistical linear estimation with penalized estimators: an application to reinforcement learning , 2012, ICML.
[70] Alessandro Lazaric,et al. Finite-sample analysis of least-squares policy iteration , 2012, J. Mach. Learn. Res..
[71] Doina Precup,et al. Value Pursuit Iteration , 2012, NIPS.
[72] Bruno Scherrer,et al. On the Use of Non-Stationary Policies for Stationary Infinite-Horizon Markov Decision Processes , 2012, NIPS.
[73] Joelle Pineau,et al. Bellman Error Based Feature Generation using Random Projections on Sparse Spaces , 2013, NIPS.
[74] Amir-massoud Farahmand. CAPI : Generalized Classification-based Approximate Policy Iteration , 2013 .
[75] Jan Peters,et al. Policy evaluation with temporal differences: a survey and comparison , 2015, J. Mach. Learn. Res..
[76] André da Motta Salles Barreto,et al. Classification-Based Approximate Policy Iteration , 2015, IEEE Transactions on Automatic Control.