Preference-based reinforcement learning: a formal framework and a policy iteration algorithm
暂无分享,去创建一个
Eyke Hüllermeier | Johannes Fürnkranz | Sang-Hyeun Park | Weiwei Cheng | Johannes Fürnkranz | E. Hüllermeier | Weiwei Cheng | Sang-Hyeun Park
[1] Jan Peters,et al. Policy Search for Motor Primitives in Robotics , 2008, NIPS 2008.
[2] Didier Dubois,et al. Qualitative decision theory with preference relations and comparative uncertainty: An axiomatic approach , 2003, Artif. Intell..
[3] Peter Stone,et al. Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..
[4] Vijay R. Konda,et al. OnActor-Critic Algorithms , 2003, SIAM J. Control. Optim..
[5] Robert Givan,et al. Approximate Policy Iteration with a Policy Language Bias , 2003, NIPS.
[6] Konkoly Thege. Multi-criteria Reinforcement Learning , 1998 .
[7] Jude W. Shavlik,et al. Giving Advice about Preferred Actions to Reinforcement Learners Via Knowledge-Based Kernel Regression , 2005, AAAI.
[8] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[9] Johannes Fürnkranz,et al. Efficient prediction algorithms for binary decomposition techniques , 2011, Data Mining and Knowledge Discovery.
[10] Andrew Y. Ng,et al. Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.
[11] A. Lazaric,et al. Rollout Allocation Strategies for Classification-based Policy Iteration , 2010 .
[12] Stefan Schaal,et al. 2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .
[13] Zbigniew Michalewicz,et al. Evolutionary Computation 2 , 2000 .
[14] Francis Maes. Learning in Markov decision processes for structured prediction : applications to sequence labeling, tree transformation and learning for search , 2009 .
[15] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.
[16] Csaba Szepesvári,et al. Multi-criteria Reinforcement Learning , 1998, ICML.
[17] John Langford,et al. The Epoch-Greedy Algorithm for Multi-armed Bandits with Side Information , 2007, NIPS.
[18] Shie Mannor,et al. The Cross Entropy Method for Fast Policy Search , 2003, ICML.
[19] Shalabh Bhatnagar,et al. Natural actor-critic algorithms , 2009, Autom..
[20] Robert Givan,et al. Relational Reinforcement Learning: An Overview , 2004, ICML 2004.
[21] Nicolò Cesa-Bianchi,et al. Gambling in a rigged casino: The adversarial multi-armed bandit problem , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.
[22] Kurt Driessens,et al. Relational Reinforcement Learning , 1998, Machine-mediated learning.
[23] Ronen I. Brafman,et al. Modeling Agents as Qualitative Decision Makers , 1997, Artif. Intell..
[24] Christian Igel,et al. Hoeffding and Bernstein races for selecting policies in evolutionary direct policy search , 2009, ICML '09.
[25] Marc Toussaint,et al. Probabilistic inference for solving discrete and continuous state Markov Decision Processes , 2006, ICML.
[26] Donato Malerba,et al. Machine Learning and Knowledge Discovery in Databases, Part III: European Conference, ECML PKDD 2010, Athens, Greece, September 5-9, 2011, ... / Lecture Notes in Artificial Intelligence) , 2011 .
[27] Shie Mannor,et al. Action Elimination and Stopping Conditions for Reinforcement Learning , 2003, ICML.
[28] Gerald Tesauro,et al. Programming backgammon using self-teaching neural nets , 2002, Artif. Intell..
[29] Eyke Hüllermeier,et al. Preference-Based Policy Iteration: Leveraging Preference Learning for Reinforcement Learning , 2011, ECML/PKDD.
[30] Bart Selman,et al. On Adversarial Search Spaces and Sampling-Based Planning , 2010, ICAPS.
[31] M. Kosorok,et al. Reinforcement learning design for cancer clinical trials , 2009, Statistics in medicine.
[32] Peter Dayan,et al. Technical Note: Q-Learning , 2004, Machine Learning.
[33] Bruno Scherrer,et al. Classification-based Policy Iteration with a Critic , 2011, ICML.
[34] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[35] Shotaro Akaho,et al. A Survey and Empirical Comparison of Object Ranking Methods , 2010, Preference Learning.
[36] R. J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[37] Claude Sammut,et al. Automatic construction of reactive control systems using symbolic machine learning , 1996, The Knowledge Engineering Review.
[38] Donald F. Beal,et al. Temporal difference learning applied to game playing and the results of application to Shogi , 2001, Theor. Comput. Sci..
[39] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[40] Johannes Fürnkranz,et al. Learning to Use Operational Advice , 2000, ECAI.
[41] Saso Dzeroski,et al. Integrating Guidance into Relational Reinforcement Learning , 2004, Machine Learning.
[42] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[43] Ashok Mohan. Learning qualitative models by an autonomous robot , 2008 .
[44] Eyke Hüllermeier,et al. Label ranking by learning pairwise preferences , 2008, Artif. Intell..
[45] Geoffrey I. Webb,et al. Encyclopedia of Machine Learning , 2011, Encyclopedia of Machine Learning.
[46] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[47] Blai Bonet,et al. Qualitative MDPs and POMDPs: An Order-Of-Magnitude Approximation , 2002, UAI.
[48] Shalabh Bhatnagar,et al. Toward Off-Policy Learning Control with Function Approximation , 2010, ICML.
[49] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[50] Jude W. Shavlik,et al. Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another , 2005, ECML.
[51] William A. Massey,et al. Stochastic ordering for Markov processes on partially ordered spaces with applications to queueing networks , 1991 .
[52] Régis Sabbadin,et al. A Possibilistic Model for Qualitative Sequential Decision Problems under Uncertainty in Partially Observable Environments , 1999, UAI.
[53] Kalyanmoy Deb,et al. Muiltiobjective Optimization Using Nondominated Sorting in Genetic Algorithms , 1994, Evolutionary Computation.
[54] Jon Doyle,et al. Background to Qualitative Decision Theory , 1999, AI Mag..
[55] Stefan Schaal,et al. Reinforcement learning by reward-weighted regression for operational space control , 2007, ICML '07.
[56] Luis Enrique Sucar,et al. Abstraction and Refinement for Solving Continuous Markov Decision Processes , 2006, Probabilistic Graphical Models.
[57] Hélène Fargier,et al. Qualitative Decision under Uncertainty: Back to Expected Utility , 2003, IJCAI.
[58] Christos Dimitrakakis,et al. Rollout sampling approximate policy iteration , 2008, Machine Learning.
[59] Ivan Bratko,et al. Learning Qualitative Models , 2004, AI Mag..
[60] Michael I. Jordan,et al. PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.
[61] Gerald DeJong,et al. Qualitative reinforcement learning , 2006, ICML.
[62] Jude W. Shavlik,et al. Creating Advice-Taking Reinforcement Learners , 1998, Machine Learning.
[63] Stefan Schaal,et al. A Generalized Path Integral Control Approach to Reinforcement Learning , 2010, J. Mach. Learn. Res..
[64] Shie Mannor,et al. A Geometric Approach to Multi-Criterion Reinforcement Learning , 2004, J. Mach. Learn. Res..
[65] Eyke Hüllermeier,et al. Preference Learning , 2005, Künstliche Intell..
[66] Eyke Hllermeier,et al. Preference Learning , 2010 .
[67] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[68] Michèle Sebag,et al. Preference-Based Policy Learning , 2011, ECML/PKDD.
[69] Janez Demsar,et al. Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..
[70] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[71] Alessandro Lazaric,et al. Analysis of a Classification-based Policy Iteration Algorithm , 2010, ICML.
[72] Kristian Kersting,et al. Non-parametric policy gradients: a unified treatment of propositional and relational domains , 2008, ICML '08.
[73] Michail G. Lagoudakis,et al. Reinforcement Learning as Classification: Leveraging Modern Classifiers , 2003, ICML.
[74] Johannes Fürnkranz,et al. Learning the Piece Values for Three Chess Variants , 2008, J. Int. Comput. Games Assoc..
[75] Evan Dekker,et al. Empirical evaluation methods for multiobjective reinforcement learning algorithms , 2011, Machine Learning.
[76] Ian H. Witten,et al. The WEKA data mining software: an update , 2009, SKDD.
[77] R. Munos,et al. Best Arm Identification in Multi-Armed Bandits , 2010, COLT.
[78] Andrew Tridgell,et al. Learning to Play Chess Using Temporal Differences , 2000, Machine Learning.
[79] John Langford,et al. Exploration scavenging , 2008, ICML '08.
[80] G. Tesauro. Practical Issues in Temporal Difference Learning , 1992 .
[81] Sanjoy Dasgupta,et al. Off-Policy Temporal Difference Learning with Function Approximation , 2001, ICML.
[82] Johannes Fürnkranz,et al. Machine Learning and Game Playing , 2010, Encyclopedia of Machine Learning and Data Mining.
[83] KasabovNikola,et al. 2008 Special issue , 2008 .
[84] Wei Chu,et al. A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.
[85] Peter Struss,et al. Qualitative Reasoning , 1997, The Computer Science and Engineering Handbook.
[86] Andrew G. Barto,et al. Elevator Group Control Using Multiple Reinforcement Learning Agents , 1998, Machine Learning.
[87] Thomas Gärtner,et al. Label Ranking Algorithms: A Survey , 2010, Preference Learning.
[88] Anthony G. Cohn,et al. Qualitative Reasoning , 1987, Advanced Topics in Artificial Intelligence.
[89] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.