Apprentissage par renforcement hirarchique dans les MDP factoriss

Resume : Nous proposons dans cet article une methode qui permet d’apprendre par renforcement a resoudre un probleme markovien de grande taille dont on ne connait pas a priori la structure en combinant les capacites d’abstraction hierarchique des Processus Decisionnels Semi-Markoviens et les capacites de factorisation des Processus Decisionnels de Markov factorises. Nous validons notre approche sur le probleme classique du taxi. Mots-cles : Apprentissage par renforcement, modeles factorises, options.

[1]  Andrew G. Barto,et al.  Causal Graph Based Decomposition of Factored MDPs , 2006, J. Mach. Learn. Res..

[2]  Jesse Hoey,et al.  SPUDD: Stochastic Planning using Decision Diagrams , 1999, UAI.

[3]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[4]  Olivier Sigaud,et al.  Chi-square Tests Driven Method for Learning the Structure of Factored MDPs , 2006, UAI.

[5]  Craig Boutilier,et al.  Exploiting Structure in Policy Construction , 1995, IJCAI.

[6]  Nuttapong Chentanez,et al.  Intrinsically Motivated Reinforcement Learning , 2004, NIPS.

[7]  Richard S. Sutton,et al.  Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.

[8]  Bernhard Hengst,et al.  Discovering Hierarchy in Reinforcement Learning with HEXQ , 2002, ICML.

[9]  Thomas G. Dietterich The MAXQ Method for Hierarchical Reinforcement Learning , 1998, ICML.

[10]  Keiji Kanazawa,et al.  A model for reasoning about persistence and causation , 1989 .

[11]  Craig Boutilier,et al.  Stochastic dynamic programming with factored representations , 2000, Artif. Intell..

[12]  Olivier Sigaud,et al.  Learning the structure of Factored Markov Decision Processes in reinforcement learning problems , 2006, ICML.

[13]  Paul E. Utgoff,et al.  Incremental Induction of Decision Trees , 1989, Machine Learning.

[14]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[15]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[16]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.