Efficient model-based exploration in continuous state-space environments
暂无分享,去创建一个
[1] Isabelle Guyon,et al. An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..
[2] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.
[3] Andrew Moore,et al. Barycentric Interpolators for Continuous Space & Time Reinforcement Learning Category : Reinforcement Learning and Control , 1998 .
[4] Michael L. Littman,et al. A theoretical analysis of Model-Based Interval Estimation , 2005, ICML.
[5] Dimitri Bertsekas,et al. Distributed dynamic programming , 1981, 1981 20th IEEE Conference on Decision and Control including the Symposium on Adaptive Processes.
[6] Fausto Giunchiglia,et al. A Theory of Abstraction , 1992, Artif. Intell..
[7] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[8] S. D. Jong. SIMPLS: an alternative approach to partial least squares regression , 1993 .
[9] Michael L. Littman,et al. An analysis of model-based Interval Estimation for Markov Decision Processes , 2008, J. Comput. Syst. Sci..
[10] Michael L. Littman,et al. A unifying framework for computational reinforcement learning theory , 2009 .
[11] Lihong Li,et al. A Bayesian Sampling Approach to Exploration in Reinforcement Learning , 2009, UAI.
[12] Hiroshi Motoda,et al. Feature Selection for Knowledge Discovery and Data Mining , 1998, The Springer International Series in Engineering and Computer Science.
[13] M. C. Jones,et al. A reliable data-based bandwidth selection method for kernel density estimation , 1991 .
[14] Lihong Li,et al. Reinforcement Learning in Finite MDPs: PAC Analysis , 2009, J. Mach. Learn. Res..
[15] Thomas J. Walsh,et al. Learning and planning in environments with delayed feedback , 2009, Autonomous Agents and Multi-Agent Systems.
[16] Trevor Hastie,et al. The Elements of Statistical Learning , 2001 .
[17] Ronald J. Williams,et al. Tight Performance Bounds on Greedy Policies Based on Imperfect Value Functions , 1993 .
[18] Michael L. Littman,et al. PAC-MDP Reinforcement Learning with Bayesian Priors , 2009 .
[19] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[20] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.
[21] R. Christensen,et al. Fisher Lecture: Dimension Reduction in Regression , 2007, 0708.3774.
[22] Jarkko Venna,et al. Local multidimensional scaling , 2006, Neural Networks.
[23] Kilian Q. Weinberger,et al. Metric Learning for Kernel Regression , 2007, AISTATS.
[24] Michael L. Littman,et al. Dimension reduction and its application to model-based exploration in continuous spaces , 2010, Machine Learning.
[25] Ron Kohavi,et al. Supervised and Unsupervised Discretization of Continuous Features , 1995, ICML.
[26] Thomas J. Walsh,et al. Planning and Learning in Environments with Delayed Feedback , 2007, ECML.
[27] Kenji Doya,et al. Reinforcement Learning in Continuous Time and Space , 2000, Neural Computation.
[28] John Langford,et al. Exploration in Metric State Spaces , 2003, ICML.
[29] Hans-Peter Kriegel,et al. Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering , 2009, TKDD.
[30] William D. Smart. Explicit Manifold Representations for Value-Function Approximation in Reinforcement Learning , 2004, ISAIM.
[31] Claude-Nicolas Fiechter. Expected Mistake Bound Model for On-Line Reinforcement Learning , 1997, ICML.
[32] Pravin Varaiya,et al. Stochastic Systems: Estimation, Identification, and Adaptive Control , 1986 .
[33] Leslie G. Valiant,et al. A theory of the learnable , 1984, CACM.
[34] T. Warren Liao,et al. Clustering of time series data - a survey , 2005, Pattern Recognit..
[35] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..
[36] Andrew W. Moore,et al. Memory-Based Reinforcement Learning: Efficient Computation with Prioritized Sweeping , 1992, NIPS.
[37] Peter Norvig,et al. Artificial Intelligence: A Modern Approach , 1995 .
[38] Sridhar Mahadevan,et al. Learning Representation and Control in Markov Decision Processes: New Frontiers , 2009, Found. Trends Mach. Learn..
[39] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[40] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[41] Carl E. Rasmussen,et al. Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.
[42] B. Kowalski,et al. Partial least-squares regression: a tutorial , 1986 .
[43] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.
[44] Yishay Mansour,et al. A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.
[45] Ronald E. Parr,et al. A Novel Benchmark Methodology and Data Repository for Real-life Reinforcement Learning , 2009 .
[46] Sylvain Gelly,et al. Exploration exploitation in Go: UCT for Monte-Carlo Go , 2006, NIPS 2006.
[47] J. Peng,et al. Efficient Learning and Planning Within the Dyna Framework , 1993, IEEE International Conference on Neural Networks.
[48] Michael L. Littman,et al. Online Linear Regression and Its Application to Model-Based Reinforcement Learning , 2007, NIPS.
[49] Karl Pearson F.R.S.. LIII. On lines and planes of closest fit to systems of points in space , 1901 .
[50] Andrew W. Moore,et al. Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.
[51] Csaba Szepesvári,et al. Finite-Time Bounds for Fitted Value Iteration , 2008, J. Mach. Learn. Res..
[52] D. Moore. Simplicial Mesh Generation with Applications , 1992 .
[53] Sunil Arya,et al. An optimal algorithm for approximate nearest neighbor searching fixed dimensions , 1998, JACM.
[54] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..
[55] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[56] Burr Settles,et al. Active Learning Literature Survey , 2009 .
[57] Alexander L. Strehl,et al. Probably Approximately Correct (PAC) Exploration in Reinforcement Learning , 2008, ISAIM.
[58] Lihong Li,et al. An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning , 2008, ICML '08.
[59] Guy Van den Broeck,et al. Monte-Carlo Tree Search in Poker Using Expected Reward Distributions , 2009, ACML.
[60] Thomas J. Walsh,et al. Towards a Unified Theory of State Abstraction for MDPs , 2006, AI&M.
[61] Sebastian Thrun,et al. The role of exploration in learning control , 1992 .
[62] Ian T. Jolliffe,et al. Principal Component Analysis , 2002, International Encyclopedia of Statistical Science.
[63] Richard S. Sutton,et al. Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.
[64] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[65] Peter Stone,et al. Model-Based Exploration in Continuous State Spaces , 2007, SARA.
[66] David Silver,et al. Combining online and offline knowledge in UCT , 2007, ICML '07.
[67] S T Roweis,et al. Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.
[68] T. Dean,et al. Planning under uncertainty: structural assumptions and computational leverage , 1996 .
[69] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[70] H. Sebastian Seung,et al. Selective Sampling Using the Query by Committee Algorithm , 1997, Machine Learning.
[71] Michael L. Littman,et al. Algorithms for Sequential Decision Making , 1996 .
[72] Doina Precup,et al. Bounding Performance Loss in Approximate MDP Homomorphisms , 2008, NIPS.
[73] Peter Auer,et al. Logarithmic Online Regret Bounds for Undiscounted Reinforcement Learning , 2006, NIPS.
[74] John N. Tsitsiklis,et al. The complexity of dynamic programming , 1989, J. Complex..
[75] Mikhail Belkin,et al. Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.
[76] Dale Schuurmans,et al. Algorithm-Directed Exploration for Model-Based Reinforcement Learning in Factored MDPs , 2002, ICML.
[77] Stefan Schaal,et al. Approximate nearest neighbor regression in very high dimensions , 2006 .
[78] Michael Kearns,et al. Efficient Reinforcement Learning in Factored MDPs , 1999, IJCAI.
[79] Benjamin Van Roy. Performance Loss Bounds for Approximate Value Iteration with State Aggregation , 2006, Math. Oper. Res..
[80] Lihong Li,et al. Online exploration in least-squares policy iteration , 2009, AAMAS.
[81] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[82] Peter Stone,et al. Gaussian Processes for Sample Efficient Reinforcement Learning with RMAX-like Exploration , 2010, ECML/PKDD.
[83] John N. Tsitsiklis,et al. Feature-based methods for large scale dynamic programming , 2004, Machine Learning.
[84] Nicholas Roy,et al. Provably Efficient Learning with Typed Parametric Models , 2009, J. Mach. Learn. Res..
[85] Csaba Szepesvári,et al. Online Optimization in X-Armed Bandits , 2008, NIPS.
[86] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.
[87] J. Tenenbaum,et al. A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.
[88] Lihong Li,et al. Incremental Model-based Learners With Formal Learning-Time Guarantees , 2006, UAI.
[89] Andrew Y. Ng,et al. Regularization and feature selection in least-squares temporal difference learning , 2009, ICML '09.
[90] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.
[91] Richard S. Sutton,et al. Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding , 1995, NIPS.
[92] Thomas J. Walsh,et al. Knows what it knows: a framework for self-aware learning , 2008, ICML.
[93] E. M. Wright,et al. Adaptive Control Processes: A Guided Tour , 1961, The Mathematical Gazette.
[94] Claude-Nicolas Fiechter,et al. Efficient reinforcement learning , 1994, COLT '94.
[95] Andrew W. Moore,et al. Variable Resolution Discretization in Optimal Control , 2002, Machine Learning.
[96] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..
[97] W. Gasarch,et al. The Book Review Column 1 Coverage Untyped Systems Simple Types Recursive Types Higher-order Systems General Impression 3 Organization, and Contents of the Book , 2022 .
[98] Chris Watkins,et al. Learning from delayed rewards , 1989 .
[99] Michael Ian Shamos,et al. Computational geometry: an introduction , 1985 .
[100] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[101] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .
[102] Audra E. Kosh,et al. Linear Algebra and its Applications , 1992 .
[103] Michael I. Jordan,et al. Kernel dimension reduction in regression , 2009, 0908.1854.
[104] Louis Wehenkel,et al. Reinforcement Learning Versus Model Predictive Control: A Comparison on a Power System Problem , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).
[105] Michael I. Jordan,et al. Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.
[106] Michael L. Littman,et al. Multi-resolution Exploration in Continuous Spaces , 2008, NIPS.
[107] W. Hoeffding. Probability Inequalities for sums of Bounded Random Variables , 1963 .