Model Selection in Reinforcement Learning
暂无分享,去创建一个
[1] M. Madkour. Nonlinear Least Squares Algorithm , 1972 .
[2] Michael I. Jordan,et al. Advances in Neural Information Processing Systems 30 , 1995 .
[3] Andrew R. Barron,et al. Complexity Regularization with Application to Artificial Neural Networks , 1991 .
[4] Richard L. Tweedie,et al. Markov Chains and Stochastic Stability , 1993, Communications and Control Engineering Series.
[5] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[6] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[7] Yuhong Yang. MODEL SELECTION FOR NONPARAMETRIC REGRESSION , 1997 .
[8] Dharmendra S. Modha,et al. Memory-Universal Prediction of Stationary Random Processes , 1998, IEEE Trans. Inf. Theory.
[9] Paul-Marie Samson,et al. Concentration of measure inequalities for Markov chains and $\Phi$-mixing processes , 2000 .
[10] Louis Wehenkel,et al. Application of Reinforcement Learning to Electrical Power System Closed-Loop Emergency Control , 2000, PKDD.
[11] Michael I. Jordan,et al. PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.
[12] S. R. Jammalamadaka,et al. Empirical Processes in M-Estimation , 2001 .
[13] Csaba Szepesvári,et al. Efficient approximate planning in continuous space Markovian Decision Problems , 2001, AI Commun..
[14] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[15] Adam Krzyzak,et al. A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.
[16] M. Wegkamp. Model selection in nonparametric regression , 2003 .
[17] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[18] Györfi László,et al. The estimation problem of minimum mean squared error , 2003 .
[19] Peter L. Bartlett,et al. Model Selection and Error Estimation , 2000, Machine Learning.
[20] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.
[21] D. Ruppert. The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .
[22] Ron Meir,et al. Nonparametric Time Series Prediction Through Adaptive Model Selection , 2000, Machine Learning.
[23] G. Lugosi,et al. Complexity regularization via localized random penalties , 2004, math/0410091.
[24] Justin A. Boyan,et al. Technical Update: Least-Squares Temporal Difference Learning , 2002, Machine Learning.
[25] Martin A. Riedmiller. Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.
[26] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..
[27] Shie Mannor,et al. Basis Function Adaptation in Temporal Difference Reinforcement Learning , 2005, Ann. Oper. Res..
[28] P. Bartlett,et al. Local Rademacher complexities , 2005, math/0508275.
[29] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[30] Shie Mannor,et al. Reinforcement learning with Gaussian processes , 2005, ICML.
[31] Pieter Abbeel,et al. An Application of Reinforcement Learning to Aerobatic Helicopter Flight , 2006, NIPS.
[32] Shimon Whiteson,et al. Evolutionary Function Approximation for Reinforcement Learning , 2006, J. Mach. Learn. Res..
[33] D. Hinkley. Annals of Statistics , 2006 .
[34] A. V. D. Vaart,et al. Oracle inequalities for multi-fold cross validation , 2006 .
[35] Shie Mannor,et al. Automatic basis function construction for approximate dynamic programming and reinforcement learning , 2006, ICML.
[36] Ambuj Tewari,et al. Sample Complexity of Policy Search with Known Dynamics , 2006, NIPS.
[37] Larry Wasserman,et al. All of Nonparametric Statistics (Springer Texts in Statistics) , 2006 .
[38] Csaba Szepesvári,et al. Learning Near-Optimal Policies with Bellman-Residual Minimization Based Fitted Policy Iteration and a Single Sample Path , 2006, COLT.
[39] Daniel Polani,et al. Least Squares SVM for Least Squares TD Learning , 2006, ECAI.
[40] A. Antos,et al. Value-Iteration Based Fitted Policy Iteration: Learning with a Single Trajectory , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.
[41] Xin Xu,et al. Kernel-Based Least Squares Policy Iteration for Reinforcement Learning , 2007, IEEE Transactions on Neural Networks.
[42] Csaba Szepesvári,et al. Fitted Q-iteration in continuous action-space MDPs , 2007, NIPS.
[43] Radford M. Neal. Pattern Recognition and Machine Learning , 2007, Technometrics.
[44] Dimitri P. Bertsekas,et al. Stochastic optimal control : the discrete time case , 2007 .
[45] Michael L. Littman,et al. Online Linear Regression and Its Application to Model-Based Reinforcement Learning , 2007, NIPS.
[46] Csaba Szepesvári,et al. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path , 2006, Machine Learning.
[47] B. Schölkopf,et al. Sample complexity of policy search with known dynamics , 2007 .
[48] M. Loth,et al. Sparse Temporal Difference Learning Using LASSO , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.
[49] Lihong Li,et al. Analyzing feature generation for value-function approximation , 2007, ICML '07.
[50] Rémi Munos,et al. Performance Bounds in Lp-norm for Approximate Value Iteration , 2007, SIAM J. Control. Optim..
[51] Csaba Szepesvári,et al. Empirical Bernstein stopping , 2008, ICML '08.
[52] Alborz Geramifard,et al. Dyna-Style Planning with Linear Function Approximation and Prioritized Sweeping , 2008, UAI.
[53] Shie Mannor,et al. Regularized Policy Iteration , 2008, NIPS.
[54] Gavin Taylor,et al. Kernelized value function approximation for reinforcement learning , 2009, ICML '09.
[55] Shie Mannor,et al. Regularized Fitted Q-Iteration for planning in continuous-space Markovian decision problems , 2009, 2009 American Control Conference.
[56] Andrew Y. Ng,et al. Regularization and feature selection in least-squares temporal difference learning , 2009, ICML '09.
[57] Carl E. Rasmussen,et al. Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.
[58] Csaba Szepesvári,et al. Model-based and Model-free Reinforcement Learning for Visual Servoing , 2009, 2009 IEEE International Conference on Robotics and Automation.
[59] B. Nadler,et al. Semi-supervised learning with the graph Laplacian: the limit of infinite unlabelled data , 2009, NIPS 2009.
[60] Sylvain Arlot,et al. A survey of cross-validation procedures for model selection , 2009, 0907.4728.
[61] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.
[62] Csaba Szepesvári,et al. Reinforcement Learning Algorithms for MDPs , 2011 .
[63] Csaba Szepesvari,et al. Regularized least-squares regression: Learning from a β-mixing sequence , 2012 .