暂无分享,去创建一个
Christos Dimitrakakis | Konstantinos Blekas | Nikolaos Tziortziotis | Christos Dimitrakakis | Nikolaos Tziortziotis | K. Blekas
[1] Y. Shtarkov,et al. The context-tree weighting method: basic properties , 1995, IEEE Trans. Inf. Theory.
[2] Stuart J. Russell,et al. Bayesian Q-Learning , 1998, AAAI/IAAI.
[3] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.
[4] Mohammad Ghavamzadeh,et al. Bayesian Policy Gradient Algorithms , 2006, NIPS.
[5] Tao Wang,et al. Bayesian sparse sampling for on-line reward optimization , 2005, ICML.
[6] Marc G. Bellemare,et al. Bayesian Learning of Recursively Factored Environments , 2013, ICML.
[7] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[8] Joel Veness,et al. Context Tree Switching , 2011, 2012 Data Compression Conference.
[9] Georg Zeitler,et al. Universal Piecewise Linear Prediction Via Context Trees , 2007, IEEE Transactions on Signal Processing.
[10] Benjamin Van Roy,et al. Universal Reinforcement Learning , 2007, IEEE Transactions on Information Theory.
[11] Olivier Buffet,et al. Near-Optimal BRL using Optimistic Local Transitions , 2012, ICML.
[12] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.
[13] Nicholas Roy,et al. Provably Efficient Learning with Typed Parametric Models , 2009, J. Mach. Learn. Res..
[14] Andrew G. Barto,et al. Linear Least-Squares Algorithms for Temporal Difference Learning , 2005, Machine Learning.
[15] Rémi Munos,et al. Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis , 2012, ALT.
[16] W. Wong,et al. Optional P\'{o}lya tree and Bayesian inference , 2010, 1010.0490.
[17] Doina Precup,et al. Smarter Sampling in Model-Based Bayesian Reinforcement Learning , 2010, ECML/PKDD.
[18] M. Degroot. Optimal Statistical Decisions , 1970 .
[19] R. R. Hocking,et al. Algorithm AS 53: Wishart Variate Generator , 1972 .
[20] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[21] Marcus Hutter,et al. Feature Reinforcement Learning using Looping Suffix Trees , 2012, EWRL.
[22] Jesse Hoey,et al. An analytic solution to discrete Bayesian reinforcement learning , 2006, ICML.
[23] Andrew G. Barto,et al. Optimal learning: computational procedures for bayes-adaptive markov decision processes , 2002 .
[24] Christos Dimitrakakis,et al. Bayesian variable order Markov models , 2010, AISTATS.
[25] Michael I. Jordan,et al. Learning with Mixtures of Trees , 2001, J. Mach. Learn. Res..
[26] Csaba Szepesvári,et al. –armed Bandits , 2022 .
[27] Yoram Singer,et al. An Efficient Extension to Mixture Techniques for Prediction and Decision Trees , 1997, COLT '97.
[28] Steven de Rooij,et al. Catching Up Faster by Switching Sooner: A Prequential Solution to the AIC-BIC Dilemma , 2008, ArXiv.
[29] Rémi Munos,et al. Thompson Sampling: An Optimal Finite Time Analysis , 2012, ArXiv.
[30] Christos Dimitrakakis,et al. Linear Bayesian Reinforcement Learning , 2013, IJCAI.
[31] Shie Mannor,et al. Sparse Online Greedy Support Vector Regression , 2002, ECML.
[32] Malcolm J. A. Strens,et al. A Bayesian Framework for Reinforcement Learning , 2000, ICML.
[33] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[34] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[35] D. Bertsekas. Approximate policy iteration: a survey and some new methods , 2011 .
[36] Carl E. Rasmussen,et al. Gaussian Processes in Reinforcement Learning , 2003, NIPS.
[37] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.
[38] Neil D. Lawrence,et al. Efficient Multioutput Gaussian Processes through Variational Inducing Kernels , 2010, AISTATS.
[39] D. Bertsekas. Dynamic Programming and Suboptimal Control: From ADP to MPC , 2005, Proceedings of the 44th IEEE Conference on Decision and Control.
[40] Lihong Li,et al. A Bayesian Sampling Approach to Exploration in Reinforcement Learning , 2009, UAI.
[41] T. Ferguson. Prior Distributions on Spaces of Probability Measures , 1974 .
[42] Christos Dimitrakakis. Context model inference for large or partially observable MDPs , 2010, ICML 2010.
[43] Joel Veness,et al. A Monte-Carlo AIXI Approximation , 2009, J. Artif. Intell. Res..
[44] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..
[45] Christos Dimitrakakis,et al. Robust Bayesian Reinforcement Learning through Tight Lower Bounds , 2011, EWRL.
[46] Christos Dimitrakakis,et al. Complexity of Stochastic Branch and Bound Methods for Belief Tree Search in Bayesian Reinforcement Learning , 2009, ICAART.
[47] Csaba Szepesvári,et al. Model-based reinforcement learning with nearly tight exploration complexity bounds , 2010, ICML.
[48] John Langford,et al. Cover trees for nearest neighbor , 2006, ICML.
[49] Shie Mannor,et al. Reinforcement learning with Gaussian processes , 2005, ICML.
[50] Bart De Schutter,et al. Online least-squares policy iteration for reinforcement learning control , 2010, Proceedings of the 2010 American Control Conference.
[51] Carl E. Rasmussen,et al. PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.
[52] Carl E. Rasmussen,et al. Gaussian process dynamic programming , 2009, Neurocomputing.
[53] Jeremy Wyatt,et al. Exploration and inference in learning from reinforcement , 1998 .
[54] Justin A. Boyan,et al. Technical Update: Least-Squares Temporal Difference Learning , 2002, Machine Learning.
[55] Shipra Agrawal,et al. Analysis of Thompson Sampling for the Multi-armed Bandit Problem , 2011, COLT.
[56] Ronald Ortner,et al. Online Regret Bounds for Undiscounted Continuous Reinforcement Learning , 2012, NIPS.
[57] Michael L. Littman,et al. Online Linear Regression and Its Application to Model-Based Reinforcement Learning , 2007, NIPS.
[58] Benjamin Van Roy,et al. (More) Efficient Reinforcement Learning via Posterior Sampling , 2013, NIPS.
[59] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[60] Pieter Abbeel,et al. Exploration and apprenticeship learning in reinforcement learning , 2005, ICML.
[61] Ran El-Yaniv,et al. On Prediction Using Variable Order Markov Models , 2004, J. Artif. Intell. Res..
[62] Christos Dimitrakakis,et al. Beliefbox: A framework for statistical methods in sequential decision making , 2007 .