Regularized Policy Iteration with Nonparametric Function Spaces
暂无分享,去创建一个
Shie Mannor | Csaba Szepesvári | Mohammad Ghavamzadeh | Amir-massoud Farahmand | Csaba Szepesvari | Shie Mannor | M. Ghavamzadeh | Amir-massoud Farahmand
[1] C. J. Stone,et al. Optimal Global Rates of Convergence for Nonparametric Regression , 1982 .
[2] M. Nussbaum. Spline Smoothing in Regression Models and Asymptotic Efficiency in $L_2$ , 1985 .
[3] P. Schweitzer,et al. Generalized polynomial approximations in Markovian decision processes , 1985 .
[4] M. Nussbaum,et al. A Risk Bound in Sobolev Class Regression , 1990 .
[5] Y. C. Pati,et al. Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition , 1993, Proceedings of 27th Asilomar Conference on Signals, Systems and Computers.
[6] Stéphane Mallat,et al. Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..
[7] Ronald J. Williams,et al. Tight Performance Bounds on Greedy Policies Based on Imperfect Value Functions , 1993 .
[8] P. Doukhan. Mixing: Properties and Examples , 1994 .
[9] Bin Yu. RATES OF CONVERGENCE FOR EMPIRICAL PROCESSES OF STATIONARY MIXING SEQUENCES , 1994 .
[10] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[11] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[12] R. Beatson,et al. A short course on fast multipole methods , 1997 .
[13] R. DeVore,et al. Nonlinear approximation , 1998, Acta Numerica.
[14] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[15] Yuhong Yang,et al. Information-theoretic determination of minimax rates of convergence , 1999 .
[16] Paul-Marie Samson,et al. Concentration of measure inequalities for Markov chains and $\Phi$-mixing processes , 2000 .
[17] M. Kohler. Inequalities for uniform deviations of averages from expectations with applications to nonparametric regression , 2000 .
[18] S. Geer. Empirical Processes in M-Estimation , 2000 .
[19] A. E. Hoerl,et al. Ridge regression: biased estimation for nonorthogonal problems , 2000 .
[20] Tomaso A. Poggio,et al. Regularization Networks and Support Vector Machines , 2000, Adv. Comput. Math..
[21] Bernhard Schölkopf,et al. A Generalized Representer Theorem , 2001, COLT/EuroCOLT.
[22] S. R. Jammalamadaka,et al. Empirical Processes in M-Estimation , 2001 .
[23] Ding-Xuan Zhou,et al. The covering number in learning theory , 2002, J. Complex..
[24] Tong Zhang,et al. Covering Number Bounds of Certain Regularized Linear Function Classes , 2002, J. Mach. Learn. Res..
[25] Adam Krzyzak,et al. A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.
[26] S. Smale,et al. ESTIMATING THE APPROXIMATION ERROR IN LEARNING THEORY , 2003 .
[27] Rémi Munos,et al. Error Bounds for Approximate Policy Iteration , 2003, ICML.
[28] Ding-Xuan Zhou,et al. Capacity of reproducing kernel spaces in learning theory , 2003, IEEE Transactions on Information Theory.
[29] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[30] R. Tibshirani,et al. Least angle regression , 2004, math/0406456.
[31] Larry S. Davis,et al. Efficient Kernel Machines Using the Improved Fast Gauss Transform , 2004, NIPS.
[32] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.
[33] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..
[34] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[35] Shie Mannor,et al. Reinforcement learning with Gaussian processes , 2005, ICML.
[36] Mikhail Belkin,et al. Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..
[37] Liming Xiang,et al. Kernel-Based Reinforcement Learning , 2006, ICIC.
[38] Larry Wasserman,et al. All of Nonparametric Statistics (Springer Texts in Statistics) , 2006 .
[39] Koby Crammer,et al. Analysis of Representations for Domain Adaptation , 2006, NIPS.
[40] M. Nussbaum. Minimax Risk, Pinsker Bound for , 2006 .
[41] Daniel Polani,et al. Least Squares SVM for Least Squares TD Learning , 2006, ECAI.
[42] Stergios B. Fotopoulos,et al. All of Nonparametric Statistics , 2007, Technometrics.
[43] Xin Xu,et al. Kernel-Based Least Squares Policy Iteration for Reinforcement Learning , 2007, IEEE Transactions on Neural Networks.
[44] Csaba Szepesvári,et al. Fitted Q-iteration in continuous action-space MDPs , 2007, NIPS.
[45] Léon Bottou,et al. The Tradeoffs of Large Scale Learning , 2007, NIPS.
[46] Dimitri P. Bertsekas,et al. Stochastic optimal control : the discrete time case , 2007 .
[47] Csaba Szepesvári,et al. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path , 2006, Machine Learning.
[48] Marek Petrik,et al. An Analysis of Laplacian Methods for Value Function Approximation in MDPs , 2007, IJCAI.
[49] M. Loth,et al. Sparse Temporal Difference Learning Using LASSO , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.
[50] Lihong Li,et al. Analyzing feature generation for value-function approximation , 2007, ICML '07.
[51] Rémi Munos,et al. Performance Bounds in Lp-norm for Approximate Value Iteration , 2007, SIAM J. Control. Optim..
[52] Sridhar Mahadevan,et al. Proto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes , 2007, J. Mach. Learn. Res..
[53] Andreas Christmann,et al. Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.
[54] Nathan Srebro,et al. SVM optimization: inverse dependence on training set size , 2008, ICML '08.
[55] H. Triebel. Theory of Function Spaces III , 2008 .
[56] Shie Mannor,et al. Regularized Policy Iteration , 2008, NIPS.
[57] Don R. Hush,et al. Optimal Rates for Regularized Least Squares Regression , 2009, COLT.
[58] Gavin Taylor,et al. Kernelized value function approximation for reinforcement learning , 2009, ICML '09.
[59] Alexandre B. Tsybakov,et al. Introduction to Nonparametric Estimation , 2008, Springer series in statistics.
[60] Shie Mannor,et al. Regularized Fitted Q-Iteration for planning in continuous-space Markovian decision problems , 2009, 2009 American Control Conference.
[61] Andrew Y. Ng,et al. Regularization and feature selection in least-squares temporal difference learning , 2009, ICML '09.
[62] Shie Mannor,et al. Regularized Fitted Q-iteration: Application to Planning , 2008, EWRL.
[63] Koby Crammer,et al. A theory of learning from different domains , 2010, Machine Learning.
[64] Yishay Mansour,et al. Domain Adaptation: Learning Bounds and Algorithms , 2009, COLT.
[65] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[66] B. Nadler,et al. Semi-supervised learning with the graph Laplacian: the limit of infinite unlabelled data , 2009, NIPS 2009.
[67] Csaba Szepesvári,et al. Model Selection in Reinforcement Learning , 2011, Machine Learning.
[68] Ronald Parr,et al. Linear Complementarity for Regularized Policy Evaluation and Improvement , 2010, NIPS.
[69] Bo Liu,et al. Basis Construction from Power Series Expansions of Value Functions , 2010, NIPS.
[70] Csaba Szepesvári,et al. Error Propagation for Approximate Policy and Value Iteration , 2010, NIPS.
[71] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.
[72] S. Mahadevan,et al. Basis construction and utilization for markov decision processes using graphs , 2010 .
[73] Matthew W. Hoffman,et al. Finite-Sample Analysis of Lasso-TD , 2011, ICML.
[74] Matthew W. Hoffman,et al. Regularized Least Squares Temporal Difference Learning with Nested ℓ2 and ℓ1 Penalization , 2011, EWRL.
[75] Matthieu Geist,et al. ℓ1-Penalized Projected Bellman Residual , 2011, EWRL.
[76] André da Motta Salles Barreto,et al. Reinforcement Learning using Kernel-Based Stochastic Factorization , 2011, NIPS.
[77] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.
[78] Csaba Szepesvari,et al. Regularization in reinforcement learning , 2011 .
[79] George Konidaris,et al. Value Function Approximation in Reinforcement Learning Using the Fourier Basis , 2011, AAAI.
[80] Amir Massoud Farahmand,et al. Action-Gap Phenomenon in Reinforcement Learning , 2011, NIPS.
[81] Alborz Geramifard,et al. Online Discovery of Feature Dependencies , 2011, ICML.
[82] Csaba Szepesvari,et al. Regularized least-squares regression: Learning from a β-mixing sequence , 2012 .
[83] André da Motta Salles Barreto,et al. On-line Reinforcement Learning Using Incremental Kernel-Based Stochastic Factorization , 2012, NIPS.
[84] Roman Vershynin,et al. Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.
[85] Bo Liu,et al. Regularized Off-Policy TD-Learning , 2012, NIPS.
[86] Ronald Parr,et al. Greedy Algorithms for Sparse Reinforcement Learning , 2012, ICML.
[87] Csaba Szepesvári,et al. Statistical linear estimation with penalized estimators: an application to reinforcement learning , 2012, ICML.
[88] Alessandro Lazaric,et al. Finite-sample analysis of least-squares policy iteration , 2012, J. Mach. Learn. Res..
[89] Doina Precup,et al. Value Pursuit Iteration , 2012, NIPS.
[90] Joelle Pineau,et al. Bellman Error Based Feature Generation using Random Projections on Sparse Spaces , 2013, NIPS.
[91] Alborz Geramifard,et al. Batch-iFDD for Representation Expansion in Large MDPs , 2013, UAI.
[92] Klaus Obermayer,et al. Construction of approximation spaces for reinforcement learning , 2013, J. Mach. Learn. Res..
[93] Zhiwei Qin,et al. Sparse Reinforcement Learning via Convex Optimization , 2014, ICML.
[94] Jan Peters,et al. Policy evaluation with temporal differences: a survey and comparison , 2015, J. Mach. Learn. Res..
[95] Philip Bachman,et al. Sample-based approximate regularization , 2014, ICML.
[96] André da Motta Salles Barreto,et al. Classification-Based Approximate Policy Iteration , 2015, IEEE Transactions on Automatic Control.
[97] Mehryar Mohri,et al. Adaptation Algorithm and Theory Based on Generalized Discrepancy , 2014, KDD.