暂无分享,去创建一个
Nan Jiang | John Langford | Akshay Krishnamurthy | Robert E. Schapire | Alekh Agarwal | R. Schapire | J. Langford | A. Krishnamurthy | Alekh Agarwal | Nan Jiang
[1] Alessandro Lazaric,et al. Finite-sample analysis of least-squares policy iteration , 2012, J. Mach. Learn. Res..
[2] László Györfi,et al. A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.
[3] Michael I. Jordan,et al. PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.
[4] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[5] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[6] Michael L. Littman,et al. A unifying framework for computational reinforcement learning theory , 2009 .
[7] André da Motta Salles Barreto,et al. Reinforcement Learning using Kernel-Based Stochastic Factorization , 2011, NIPS.
[8] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[9] John Langford,et al. PAC Reinforcement Learning with Rich Observations , 2016, NIPS.
[10] J. Lamperti. ON CONVERGENCE OF STOCHASTIC PROCESSES , 1962 .
[11] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[12] Satinder Singh,et al. An upper bound on the loss from approximate optimal-value functions , 1994, Machine Learning.
[13] David Haussler,et al. Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..
[14] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[15] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .
[16] Shie Mannor,et al. Contextual Markov Decision Processes , 2015, ArXiv.
[17] Csaba Szepesvári,et al. Error Propagation for Approximate Policy and Value Iteration , 2010, NIPS.
[18] Tom Schaul,et al. Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.
[19] Tom Schaul,et al. Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.
[20] Philip M. Long,et al. A Generalization of Sauer's Lemma , 1995, J. Comb. Theory, Ser. A.
[21] Christoph Dann,et al. Sample Complexity of Episodic Fixed-Horizon Reinforcement Learning , 2015, NIPS.
[22] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.
[23] Balas K. Natarajan,et al. On learning sets and functions , 2004, Machine Learning.
[24] Yishay Mansour,et al. A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.
[25] Marcus Hutter,et al. Universal Artificial Intellegence - Sequential Decisions Based on Algorithmic Probability , 2005, Texts in Theoretical Computer Science. An EATCS Series.
[26] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..
[27] Katja Hofmann,et al. The Malmo Platform for Artificial Intelligence Experimentation , 2016, IJCAI.
[28] Richard S. Sutton,et al. Predictive Representations of State , 2001, NIPS.
[29] Michael R. James,et al. Predictive State Representations: A New Theory for Modeling Dynamical Systems , 2004, UAI.
[30] Rémi Munos,et al. Error Bounds for Approximate Policy Iteration , 2003, ICML.
[31] Kamyar Azizzadenesheli,et al. Reinforcement Learning of POMDPs using Spectral Methods , 2016, COLT.
[32] Philip M. Long,et al. Characterizations of Learnability for Classes of {0, ..., n}-Valued Functions , 1995, J. Comput. Syst. Sci..
[33] Marcel Paul Schützenberger,et al. On the Definition of a Family of Automata , 1961, Inf. Control..
[34] Michael J. Todd. On Minimum Volume Ellipsoids Containing Part of a Given Ellipsoid , 1982, Math. Oper. Res..
[35] Zheng Wen,et al. Efficient Exploration and Value Function Generalization in Deterministic Systems , 2013, NIPS.
[36] John Langford,et al. Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits , 2014, ICML.
[37] B. Anderson,et al. Optimal control: linear quadratic methods , 1990 .
[38] Benjamin Van Roy,et al. Eluder Dimension and the Sample Complexity of Optimistic Exploration , 2013, NIPS.
[39] John Langford,et al. Exploration in Metric State Spaces , 2003, ICML.
[40] Jason Pazis,et al. Efficient PAC-Optimal Exploration in Concurrent, Continuous State MDPs with Delayed Updates , 2016, AAAI.
[41] Byron Boots,et al. Closing the learning-planning loop with predictive state representations , 2009, Int. J. Robotics Res..
[42] Michael J. Todd,et al. The Ellipsoid Method: A Survey , 1980 .
[43] Csaba Szepesvári,et al. Finite-Time Bounds for Fitted Value Iteration , 2008, J. Mach. Learn. Res..
[44] John Langford,et al. The Epoch-Greedy Algorithm for Multi-armed Bandits with Side Information , 2007, NIPS.
[45] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[46] John Langford,et al. Efficient Optimal Learning for Contextual Bandits , 2011, UAI.
[47] Benjamin Van Roy,et al. Model-based Reinforcement Learning and the Eluder Dimension , 2014, NIPS.
[48] Lihong Li,et al. PAC model-free reinforcement learning , 2006, ICML.
[49] Peter Stone,et al. Model-Based Exploration in Continuous State Spaces , 2007, SARA.
[50] D. Panchenko. Some Extensions of an Inequality of Vapnik and Chervonenkis , 2002, math/0405342.
[51] David Haussler,et al. Sphere Packing Numbers for Subsets of the Boolean n-Cube with Bounded Vapnik-Chervonenkis Dimension , 1995, J. Comb. Theory, Ser. A.
[52] Csaba Szepesvári,et al. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path , 2006, Machine Learning.
[53] John Langford,et al. Contextual Bandit Learning with Predictable Rewards , 2012, AISTATS.
[54] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 1998, Machine Learning.
[55] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..
[56] André da Motta Salles Barreto,et al. Policy Iteration Based on Stochastic Factorization , 2014, J. Artif. Intell. Res..
[57] Michael J. Todd,et al. On Khachiyan's algorithm for the computation of minimum-volume enclosing ellipsoids , 2007, Discret. Appl. Math..