Model-free Causal Reinforcement Learning with Causal Diagrams

We present a new model-free causal reinforcement learning approach that utilizes the structure of causal diagrams, which could be learned during causal representation learning and causal discovery. Unlike the majority of approaches in causal reinforcement learning that focus on modelbased approaches and off-policy evaluations, we explore another direction: online model-free methods. We achieve this by extending a causal sequential decision-making formulation with factored Markov decision process (FMDP) and MDP with unobserved confounders (MDPUC), and by incorporating the concept of action as intervention. The choice of extending MDPUC addresses the issue of bidirectional arcs in learned causal diagrams. The action as intervention idea allows for the incorporation of high-level action models into the action space in an RL environment as a vector of interventions to the causal variables. We also present a value decomposition method and utilize the value decomposition network architecture popular in multi-agent reinforcement learning, showing encouraging preliminary evaluation results.

[1]  Junzhe Zhang,et al.  Sequential Causal Imitation Learning with Unobserved Confounders , 2022, NeurIPS.

[2]  P. Stone,et al.  Causal Dynamics Learning for Task-Independent State Abstraction , 2022, ICML.

[3]  David Bruns-Smith,et al.  Model-Free and Model-Based Policy Evaluation when Causality is Uncertain , 2022, ICML.

[4]  Taco Cohen,et al.  Weakly supervised causal representation learning , 2022, NeurIPS.

[5]  David Poole Probabilistic and Causal Inference: The Works of Judea Pearl , 2022 .

[6]  José Miguel Hernández-Lobato,et al.  Action-Sufficient State Representation Learning for Control with Structural Constraints , 2021, ICML.

[7]  Danilo Jimenez Rezende,et al.  Systematic Evaluation of Causal Discovery in Visual Model Based Reinforcement Learning , 2021, NeurIPS Datasets and Benchmarks.

[8]  Yoshua Bengio,et al.  Towards Causal Representation Learning , 2021, ArXiv.

[9]  Jaron J. R. Lee,et al.  Path Dependent Structural Equation Models , 2020, UAI.

[10]  Animesh Garg,et al.  Counterfactual Data Augmentation using Locally Factored Dynamics , 2020, NeurIPS.

[11]  Thomas Kipf,et al.  Object-Centric Learning with Slot Attention , 2020, NeurIPS.

[12]  Zhuoran Yang,et al.  Provably Efficient Causal Reinforcement Learning with Confounded Observational Data , 2020, NeurIPS.

[13]  Elias Bareinboim,et al.  A Calculus for Stochastic Interventions: Causal Effect Identification and Surrogate Experiments , 2020, AAAI.

[14]  Emma Brunskill,et al.  Off-policy Policy Evaluation For Sequential Decisions Under Unobserved Confounding , 2020, NeurIPS.

[15]  Elise van der Pol,et al.  Contrastive Learning of Structured World Models , 2019, ICLR.

[16]  B. Schölkopf,et al.  Causality for Machine Learning , 2019, Probabilistic and Causal Inference.

[17]  Nan Rosemary Ke,et al.  Learning Neural Causal Models from Unknown Interventions , 2019, ArXiv.

[18]  Joelle Pineau,et al.  Learning Causal State Representations of Partially Observable Environments , 2019, ArXiv.

[19]  Alexander Lerchner,et al.  COBRA: Data-Efficient Model-Based RL through Unsupervised Object Discovery and Curiosity-Driven Exploration , 2019, ArXiv.

[20]  Proceedings of the IEEE , 2018, IEEE Journal of Emerging and Selected Topics in Power Electronics.

[21]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[22]  Joel Z. Leibo,et al.  Value-Decomposition Networks For Cooperative Multi-Agent Learning , 2017, ArXiv.

[23]  Tor Lattimore,et al.  Causal Bandits: Learning Good Interventions via Causal Inference , 2016, NIPS.

[24]  Elias Bareinboim,et al.  Bandits with Unobserved Confounders: A Causal Approach , 2015, NIPS.

[25]  Satyandra K. Gupta,et al.  Towards robust assembly with knowledge representation for the planning domain definition language (PDDL) , 2015 .

[26]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[27]  volume 134 , 2012, Engineering Research Journal.

[28]  J. Otsuka 因果と実在, Judea Pearl, Causality, 第二版書評 , 2012 .

[29]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[30]  Jin Tian,et al.  Identifying Dynamic Sequential Plans , 2008, UAI.

[31]  A. Philip Dawid,et al.  Direct and Indirect Effects of Sequential Treatments , 2006, UAI.

[32]  Marco Valtorta,et al.  Pearl's Calculus of Intervention Is Complete , 2006, UAI.

[33]  Judea Pearl,et al.  Identification of Joint Interventional Distributions in Recursive Semi-Markovian Causal Models , 2006, AAAI.

[34]  Shobha Venkataraman,et al.  Efficient Solution Algorithms for Factored MDPs , 2003, J. Artif. Intell. Res..

[35]  Jin Tian,et al.  A general identification condition for causal effects , 2002, AAAI/IAAI.

[36]  Craig Boutilier,et al.  Decision-Theoretic Planning: Structural Assumptions and Computational Leverage , 1999, J. Artif. Intell. Res..

[37]  Edwin P. D. Pednault,et al.  ADL: Exploring the Middle Ground Between STRIPS and the Situation Calculus , 1989, KR.

[38]  Keiji Kanazawa,et al.  A model for reasoning about persistence and causation , 1989 .

[39]  Richard Fikes,et al.  STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving , 1971, IJCAI.

[40]  Illtyd Trethowan Causality , 1938 .

[41]  E. Bareinboim,et al.  Can Humans Be out of the Loop? , 2022, CLeaR.

[42]  Elias Bareinboim,et al.  Characterizing Optimal Mixed Policies: Where to Intervene and What to Observe , 2020, NeurIPS.

[43]  Elias Bareinboim,et al.  Markov Decision Processes with Unobserved Confounders : A Causal Approach , 2016 .

[44]  Christian Borgelt,et al.  Computational Intelligence , 2016, Texts in Computer Science.

[45]  V. Lifschitz,et al.  Action Languages , 1998, Electron. Trans. Artif. Intell..