Efficient learning of relational models for sequential decision making

The exploration-exploitation tradeoff is crucial to reinforcement-learning (RL) agents, and a significant number of sample complexity results have been derived for agents in propositional domains. These results guarantee, with high probability, near-optimal behavior in all but a polynomial number of timesteps in the agent’s lifetime. In this work, we prove similar results for certain relational representations, primarily a class we call “relational action schemas”. These generalized models allow us to specify state transitions in a compact form, for instance describing the effect of picking up a generic block instead of picking up 10 different specific blocks. We present theoretical results on crucial subproblems in action-schema learning using the KWIK framework, which allows us to characterize the sample efficiency of an agent learning these models in a reinforcement-learning setting. These results are extended in an apprenticeship learning paradigm where and agent has access not only to its environment, but also to a teacher that can demonstrate traces of state/action/state sequences. We show that the class of action schemas that are efficiently learnable in this paradigm is strictly larger than those learnable in the online setting. We link the class of efficiently learnable dynamics in the apprenticeship setting to a rich class of models derived from well-known learning frameworks. As an application, we present theoretical and empirical results on learning relational models of web-service descriptions using a dataflow model called a Task Graph to capture the important connections between inputs and outputs of services in a workflow, with experiments constructed using publicly available web services. This application shows that compact relational models can be efficiently learned from limited amounts of basic data. Finally, we present several extensions of the main results in the thesis, including expansions of the languages with Description Logics. We also explore the use of sample-based planners to speed up the computation time of our algorithms.

[1]  Michael L. Littman,et al.  A theoretical analysis of Model-Based Interval Estimation , 2005, ICML.

[2]  Michael Kearns,et al.  Efficient Reinforcement Learning in Factored MDPs , 1999, IJCAI.

[3]  Axel Großmann,et al.  Symbolic Dynamic Programming within the Fluent Calculus , 2002 .

[4]  Tamás Horváth,et al.  Learning logic programs with structured background knowledge , 2001, Artif. Intell..

[5]  Alan Fern,et al.  UCT for Tactical Assault Planning in Real-Time Strategy Games , 2009, IJCAI.

[6]  Roni Khardon,et al.  The subsumption lattice and query learning , 2006, J. Comput. Syst. Sci..

[7]  Saso Dzeroski Relational Reinforcement Learning for Agents in Worlds with Objects , 2002, Adaptive Agents and Multi-Agents Systems.

[8]  G. Rota The Number of Partitions of a Set , 1964 .

[9]  Maurice Bruynooghe,et al.  Learning Relational Options for Inductive Transfer in Relational Reinforcement Learning , 2007, ILP.

[10]  Michael L. Littman,et al.  A unifying framework for computational reinforcement learning theory , 2009 .

[11]  Long Ji Lin,et al.  Programming Robots Using Reinforcement Learning and Teaching , 1991, AAAI.

[12]  Rémi Munos,et al.  Bandit Algorithms for Tree Search , 2007, UAI.

[13]  Tim Oates,et al.  A Context Driven Approach for Workflow Mining , 2009, IJCAI.

[14]  Jude W. Shavlik,et al.  Building Relational World Models for Reinforcement Learning , 2007, ILP.

[15]  Kurt Driessens,et al.  Relational Instance Based Regression for Relational Reinforcement Learning , 2003, ICML.

[16]  Ralf Küsters,et al.  Nonstandard Inferences in Description Logics: The Story So Far , 2006 .

[17]  Roni Khardon,et al.  Learning to Take Actions , 1996, Machine Learning.

[18]  H. Sharp Cardinality of finite topologies , 1968 .

[19]  Lihong Li,et al.  Reinforcement Learning in Finite MDPs: PAC Analysis , 2009, J. Mach. Learn. Res..

[20]  Michael Kearns,et al.  Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.

[21]  Roni Khardon,et al.  Learning Action Strategies for Planning Domains , 1999, Artif. Intell..

[22]  J. McCarthy Situations, Actions, and Causal Laws , 1963 .

[23]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[24]  Matthias Klusch,et al.  SAWSDL-MX2: A Machine-Learning Approach for Integrating Semantic Web Service Matchmaking Variants , 2009, 2009 IEEE International Conference on Web Services.

[25]  David M. Bradley,et al.  Boosting Structured Prediction for Imitation Learning , 2006, NIPS.

[26]  Thomas J. Walsh,et al.  Security Considerations for Voice Over IP Systems , 2005 .

[27]  Maurice Bruynooghe,et al.  Online Learning and Exploiting Relational Models in Reinforcement Learning , 2007, IJCAI.

[28]  Jude W. Shavlik,et al.  Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another , 2005, ECML.

[29]  Robert P. Goldman,et al.  Using Classical Planners to Solve Nondeterministic Planning Problems , 2008, ICAPS.

[30]  D. Richard Kuhn,et al.  Challenges in securing voice over IP , 2005, IEEE Security & Privacy Magazine.

[31]  Peter Auer,et al.  Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..

[32]  Yishay Mansour,et al.  A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.

[33]  Botond Cseke,et al.  Advances in Neural Information Processing Systems 20 (NIPS 2007) , 2008 .

[34]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[35]  Thomas J. Walsh,et al.  Generalizing Apprenticeship Learning across Hypothesis Classes , 2010, ICML.

[36]  J. Bruijn,et al.  Effective query rewriting with ontologies over DBoxes , 2009, IJCAI 2009.

[37]  John N. Tsitsiklis,et al.  Introduction to linear optimization , 1997, Athena scientific optimization and computation series.

[38]  Arun K. Pujari,et al.  A Tighter Error Bound for Decision Tree Learning Using PAC Learnability , 2007, IJCAI.

[39]  Michael L. Littman,et al.  Online Linear Regression and Its Application to Model-Based Reinforcement Learning , 2007, NIPS.

[40]  Saso Dzeroski,et al.  PAC-learnability of determinate logic programs , 1992, COLT '92.

[41]  Xuemei Wang,et al.  Learning by Observation and Practice: An Incremental Approach for Planning Operator Acquisition , 1995, ICML.

[42]  Thomas J. Walsh Transferring State Abstractions Between MDPs , 2006 .

[43]  Tom Bylander,et al.  The Computational Complexity of Propositional STRIPS Planning , 1994, Artif. Intell..

[44]  Manfred K. Warmuth,et al.  Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..

[45]  Robert Givan,et al.  Approximate Policy Iteration with a Policy Language Bias , 2003, NIPS.

[46]  David Silver,et al.  Combining online and offline knowledge in UCT , 2007, ICML '07.

[47]  Michael L. Littman,et al.  Efficient Structure Learning in Factored-State MDPs , 2007, AAAI.

[48]  Roni Khardon,et al.  First Order Decision Diagrams for Relational MDPs , 2007, IJCAI.

[49]  Charles Lee Isbell,et al.  Schema Learning: Experience-Based Construction of Predictive Action Models , 2004, NIPS.

[50]  De,et al.  Relational Reinforcement Learning , 2022 .

[51]  N. Littlestone Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[52]  Scott Sanner,et al.  Practical solution techniques for first-order MDPs , 2009, Artif. Intell..

[53]  Umesh V. Vazirani,et al.  An Introduction to Computational Learning Theory , 1994 .

[54]  Robert E. Schapire,et al.  A Game-Theoretic Approach to Apprenticeship Learning , 2007, NIPS.

[55]  Thomas J. Walsh,et al.  Efficient Learning of Action Schemas and Web-Service Descriptions , 2008, AAAI.

[56]  Scott Stevens,et al.  Reinforcement Learning in Nonstationary Environment Navigation Tasks , 2007, Canadian Conference on AI.

[57]  Amit P. Sheth,et al.  Meteor-s web service annotation framework , 2004, WWW '04.

[58]  Anthony C. Klug Equivalence of Relational Algebra and Relational Calculus Query Languages Having Aggregate Functions , 1982, JACM.

[59]  T. Matise,et al.  Widespread RNA editing of embedded alu elements in the human transcriptome. , 2004, Genome research.

[60]  Thomas J. Walsh,et al.  Integrating Sample-Based Planning and Model-Based Reinforcement Learning , 2010, AAAI.

[61]  Scott Sherwood Benson,et al.  Learning action models for reactive autonomous agents , 1996 .

[62]  Saso Dzeroski,et al.  Integrating Experimentation and Guidance in Relational Reinforcement Learning , 2002, ICML.

[63]  Thomas J. Walsh,et al.  Efficient Exploration With Latent Structure , 2005, Robotics: Science and Systems.

[64]  Thomas Schwentick,et al.  Inference of concise DTDs from XML data , 2006, VLDB.

[65]  Michael Kearns,et al.  On the complexity of teaching , 1991, COLT '91.

[66]  Annapaola Marconi,et al.  AutomatedWeb Service Composition at Work: the Amazon/MPS Case Study. , 2007, IEEE International Conference on Web Services (ICWS 2007).

[67]  Herman Lam,et al.  Web Service Matching by Ontology Instance Categorization , 2008, 2008 IEEE International Conference on Services Computing.

[68]  Marc Toussaint,et al.  Exploration in Relational Worlds , 2010, ECML/PKDD.

[69]  Michael L. Littman,et al.  Algorithms for Sequential Decision Making , 1996 .

[70]  Omid Madani,et al.  Polynomial Value Iteration Algorithms for Detrerminstic MDPs , 2002, UAI.

[71]  Thomas J. Walsh,et al.  Towards a Unified Theory of State Abstraction for MDPs , 2006, AI&M.

[72]  Michael N. Huhns,et al.  Ontology Reconciliation for Service-Oriented Computing , 2006, 2006 IEEE International Conference on Services Computing (SCC'06).

[73]  Keiji Kanazawa,et al.  A model for reasoning about persistence and causation , 1989 .

[74]  Lihong Li,et al.  The adaptive k-meteorologists problem and its application to structure learning and feature selection in reinforcement learning , 2009, ICML '09.

[75]  L. P. Kaelbling,et al.  Learning Symbolic Models of Stochastic Domains , 2007, J. Artif. Intell. Res..

[76]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[77]  Tadao Murata,et al.  Petri nets: Properties, analysis and applications , 1989, Proc. IEEE.

[78]  Thomas J. Walsh,et al.  Exploring compact reinforcement-learning representations with linear regression , 2009, UAI.

[79]  Nicholas Kushmerick,et al.  Learning to Attach Semantic Metadata to Web Services , 2003, International Semantic Web Conference.

[80]  Scott Sanner,et al.  Approximate Linear Programming for First-order MDPs , 2005, UAI.

[81]  Marc Toussaint,et al.  Approximate inference for planning in stochastic relational worlds , 2009, ICML '09.

[82]  Philip M. Long,et al.  Apple Tasting , 2000, Inf. Comput..

[83]  Kristina Lerman,et al.  Automatically Labeling the Inputs and Outputs of Web Services , 2006, AAAI.

[84]  Thomas J. Walsh,et al.  Democratic approximation of lexicographic preference models , 2008, ICML '08.

[85]  Satinder Singh,et al.  An upper bound on the loss from approximate optimal-value functions , 1994, Machine Learning.

[86]  Thomas J. Walsh,et al.  Planning with Conceptual Models Mined from User Behavior , 2007 .

[87]  Peter Stone,et al.  State Abstraction Discovery from Irrelevant State Variables , 2005, IJCAI.

[88]  William W. Cohen Pac-learning Recursive Logic Programs: Negative Results , 1994, J. Artif. Intell. Res..

[89]  Pieter Abbeel,et al.  Exploration and apprenticeship learning in reinforcement learning , 2005, ICML.

[90]  Nicolò Cesa-Bianchi,et al.  On-line learning with malicious noise and the closure algorithm , 1994, Annals of Mathematics and Artificial Intelligence.

[91]  Leslie G. Valiant,et al.  Computational limitations on learning from examples , 1988, JACM.

[92]  Shobha Venkataraman,et al.  Efficient Solution Algorithms for Factored MDPs , 2003, J. Artif. Intell. Res..

[93]  Steven D. Whitehead,et al.  Complexity and Cooperation in Q-Learning , 1991, ML.

[94]  Thomas J. Walsh,et al.  A Multiple Representation Approach to Learning Dynamical Systems , 2007, AAAI Fall Symposium: Computational Approaches to Representation Change during Learning and Development.

[95]  Craig Boutilier,et al.  Symbolic Dynamic Programming for First-Order MDPs , 2001, IJCAI.

[96]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[97]  Jens Lehmann,et al.  Ideal Downward Refinement in the EL Description Logic , 2009, ILP.

[98]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[99]  Qiang Yang,et al.  Learning action models from plan examples using weighted MAX-SAT , 2007, Artif. Intell..

[100]  Peter Auer,et al.  Theory and Applications of Agnostic PAC-Learning with Small Decision Trees , 1995, ICML.

[101]  D. Kumaran,et al.  Frames, Biases, and Rational Decision-Making in the Human Brain , 2006, Science.

[102]  Nicholas Roy,et al.  CORL: A Continuous-state Offset-dynamics Reinforcement Learner , 2008, UAI.

[103]  Thomas J. Walsh,et al.  Learning and planning in environments with delayed feedback , 2009, Autonomous Agents and Multi-Agent Systems.

[104]  Michael L. Littman,et al.  Efficient Reinforcement Learning with Relocatable Action Models , 2007, AAAI.

[105]  Carlos Guestrin,et al.  Generalizing plans to new environments in relational MDPs , 2003, IJCAI 2003.

[106]  Anton Riabov,et al.  A Planning Approach for Message-Oriented Semantic Web Service Composition , 2007, AAAI.

[107]  William W. Cohen,et al.  Learning the Classic Description Logic: Theoretical and Experimental Results , 1994, KR.

[108]  Avrim Blum,et al.  Separating distribution-free and mistake-bound learning models over the Boolean domain , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[109]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[110]  Peter Stone,et al.  Improving Action Selection in MDP's via Knowledge Transfer , 2005, AAAI.

[111]  Maja Milicic Brandt Action, time and space in description logics , 2008 .

[112]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[113]  Craig A. Knoblock,et al.  Learning Semantic Definitions of Online Information Sources , 2007, J. Artif. Intell. Res..

[114]  Hector Muñoz-Avila,et al.  Learning HTN Method Preconditions and Action Models from Partial Observations , 2009, IJCAI.

[115]  William W. Cohen Pac-Learning Recursive Logic Programs: Efficient Algorithms , 1994, J. Artif. Intell. Res..

[116]  Hector J. Levesque,et al.  GOLOG: A Logic Programming Language for Dynamic Domains , 1997, J. Log. Program..

[117]  Wil M. P. van der Aalst,et al.  Workflow mining: discovering process models from event logs , 2004, IEEE Transactions on Knowledge and Data Engineering.

[118]  Lihong Li,et al.  PAC model-free reinforcement learning , 2006, ICML.

[119]  Piergiorgio Bertoli,et al.  Web Service Composition as Planning, Revisited: In Between Background Theories and Initial State Uncertainty , 2007, AAAI.

[120]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[121]  Martijn van Otterlo,et al.  The Logic of Adaptive Behavior - Knowledge Representation and Algorithms for Adaptive Sequential Decision Making under Uncertainty in First-Order and Relational Domains , 2009, Frontiers in Artificial Intelligence and Applications.

[122]  Dana Angluin,et al.  Queries and concept learning , 1988, Machine Learning.

[123]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[124]  Ronen I. Brafman,et al.  R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..

[125]  Diego Calvanese,et al.  The Description Logic Handbook: Theory, Implementation, and Applications , 2003, Description Logic Handbook.

[126]  Richard Fikes,et al.  STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving , 1971, IJCAI.

[127]  Kristian Kersting,et al.  Generalized First Order Decision Diagrams for First Order Markov Decision Processes , 2009, IJCAI.

[128]  Kurt Driessens,et al.  Speeding Up Relational Reinforcement Learning through the Use of an Incremental First Order Decision Tree Learner , 2001, ECML.

[129]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[130]  Andre Cohen,et al.  An object-oriented representation for efficient reinforcement learning , 2008, ICML '08.

[131]  Philip W. L. Fong A Quantitative Study of Hypothesis Selection , 1995, ICML.

[132]  Thomas J. Walsh,et al.  Planning and Learning in Environments with Delayed Feedback , 2007, ECML.

[133]  Thomas Gärtner,et al.  Graph kernels and Gaussian processes for relational reinforcement learning , 2006, Machine Learning.

[134]  Sridhar Mahadevan,et al.  Average reward reinforcement learning: Foundations, algorithms, and empirical results , 2004, Machine Learning.

[135]  Avrim Blum,et al.  Fast Planning Through Planning Graph Analysis , 1995, IJCAI.

[136]  Jens Lehmann,et al.  Concept learning in description logics using refinement operators , 2009, Machine Learning.

[137]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[138]  Avrim Blum,et al.  Learning boolean functions in an infinite attribute space , 1990, STOC '90.

[139]  Sham M. Kakade,et al.  On the sample complexity of reinforcement learning. , 2003 .

[140]  Andrew Wilson,et al.  Toward a Topological Theory of Relational Reinforcement Learning for Navigation Tasks , 2005, FLAIRS Conference.

[141]  Gustavo Alonso,et al.  Web Services: Concepts, Architectures and Applications , 2009 .

[142]  Jesse Hoey,et al.  SPUDD: Stochastic Planning using Decision Diagrams , 1999, UAI.

[143]  Scott Sanner,et al.  Practical Linear Value-approximation Techniques for First-order MDPs , 2006, UAI.

[144]  Roland J. Zito-Wolf,et al.  Learning search control knowledge: An explanation-based approach , 1991, Machine Learning.

[145]  Håkan L. S. Younes,et al.  The First Probabilistic Track of the International Planning Competition , 2005, J. Artif. Intell. Res..

[146]  Raymond Reiter,et al.  The Frame Problem in the Situation Calculus: A Simple Solution (Sometimes) and a Completeness Result for Goal Regression , 1991, Artificial and Mathematical Theory of Computation.

[147]  Deborah L. McGuinness,et al.  CLASSIC: a structural data model for objects , 1989, SIGMOD '89.

[148]  Thomas J. Walsh,et al.  Knows what it knows: a framework for self-aware learning , 2008, ICML '08.

[149]  Rocco A. Servedio,et al.  On PAC learning algorithms for rich Boolean function classes , 2006, Theor. Comput. Sci..

[150]  Yolanda Gil,et al.  Learning by Experimentation: Incremental Refinement of Incomplete Planning Domains , 1994, International Conference on Machine Learning.

[151]  Alexander Borgida,et al.  Towards Measuring Similarity in Description Logics , 2005, Description Logics.

[152]  Wil M. P. van der Aalst,et al.  Process mining: a research agenda , 2004, Comput. Ind..