A Survey of Knowledge-based Sequential Decision Making under Uncertainty

Reasoning with declarative knowledge (RDK) and sequential decision-making (SDM) are two key research areas in artificial intelligence. RDK methods reason with declarative domain knowledge, including commonsense knowledge, that is either provided a priori or acquired over time, while SDM methods (probabilistic planning and reinforcement learning) seek to compute action policies that maximize the expected cumulative utility over a time horizon; both classes of methods reason in the presence of uncertainty. Despite the rich literature in these two areas, researchers have not fully explored their complementary strengths. In this paper, we survey algorithms that leverage RDK methods while making sequential decisions under uncertainty. We discuss significant developments, open problems, and directions for future work.

[1]  George A. Bekey,et al.  On autonomous robots , 1998, The Knowledge Engineering Review.

[2]  Scott Sanner,et al.  Symbolic Dynamic Programming for First-order POMDPs , 2010, AAAI.

[3]  Mohan Sridharan,et al.  What do you really want to do? Towards a Theory of Intentions for Human-Robot Collaboration , 2020, Annals of Mathematics and Artificial Intelligence.

[4]  Gary Geunbae Lee,et al.  Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , 2012, ACL 2012.

[5]  Luc De Raedt,et al.  DTProbLog: A Decision-Theoretic Probabilistic Prolog , 2010, AAAI.

[6]  M. Grzes,et al.  Plan-based reward shaping for reinforcement learning , 2008, 2008 4th International IEEE Conference Intelligent Systems.

[7]  Matthew Richardson,et al.  Markov logic networks , 2006, Machine Learning.

[8]  Matthias Scheutz,et al.  First steps toward natural human-like HRI , 2007, Auton. Robots.

[9]  John E. Laird,et al.  The Soar Cognitive Architecture , 2012 .

[10]  梶田 尚志,et al.  IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS'97) , 1998 .

[11]  J. Nelson Rushton,et al.  Probabilistic reasoning with answer sets , 2004, Theory and Practice of Logic Programming.

[12]  Hector J. Levesque,et al.  Reasoning about discrete and continuous noisy sensors and effectors in dynamical systems , 2018, Artif. Intell..

[13]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[14]  Deborah L. McGuinness,et al.  OWL Web ontology language overview , 2004 .

[15]  Joohyung Lee,et al.  Weighted Rules under the Stable Model Semantics , 2016, KR.

[16]  Sergey Levine,et al.  Search on the Replay Buffer: Bridging Planning and Reinforcement Learning , 2019, NeurIPS.

[17]  Manuela M. Veloso,et al.  Using the Web to Interactively Learn to Find Objects , 2012, AAAI.

[18]  Yi Wang,et al.  Bridging Commonsense Reasoning and Probabilistic Planning via a Probabilistic Action Language , 2019, Theory and Practice of Logic Programming.

[19]  Weijun Zhu,et al.  Plog: Its algorithms and applications , 2012 .

[20]  Richard Dearden,et al.  A Switching Planner for Combined Task and Observation Planning , 2011, AAAI.

[21]  Xiaoping Chen,et al.  Leveraging commonsense reasoning and multimodal perception for robot spoken dialog systems , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[22]  Stuart J. Russell,et al.  BLOG: Probabilistic Models with Unknown Objects , 2005, IJCAI.

[23]  Mohan Sridharan,et al.  Mixed Logical Inference and Probabilistic Planning for Robots in Unreliable Worlds , 2015, IEEE Transactions on Robotics.

[24]  Daniel Kudenko,et al.  Combining Reinforcement Learning with Symbolic Planning , 2007, Adaptive Agents and Multi-Agents Systems.

[25]  Michael Gelfond,et al.  P-log: refinement and a new coherency condition , 2019, Annals of Mathematics and Artificial Intelligence.

[26]  Krysia Broda,et al.  Induction of Subgoal Automata for Reinforcement Learning , 2019, AAAI.

[27]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[28]  Joseph J. Lim,et al.  Program Guided Agent , 2020, ICLR.

[29]  Peter Stone,et al.  Dynamically Constructed (PO)MDPs for Adaptive Robot Planning , 2017, AAAI.

[30]  Peter Stone,et al.  A synthesis of automated planning and reinforcement learning for efficient, robust decision-making , 2016, Artif. Intell..

[31]  Jude W. Shavlik,et al.  Knowledge-Based Artificial Neural Networks , 1994, Artif. Intell..

[32]  Michael Gelfond,et al.  Applications of Answer Set Programming , 2016, AI Mag..

[33]  Nils J. Nilsson,et al.  Artificial Intelligence , 1974, IFIP Congress.

[34]  Reinaldo A. C. Bianchi,et al.  Answer set programming for non-stationary Markov decision processes , 2017, Applied Intelligence.

[35]  Christopher A. Dieckmann,et al.  Robust Decision Making , 2010 .

[36]  Aravaipa Canyon Basin,et al.  Volume 3 , 2012, Journal of Diabetes Investigation.

[37]  Tomás Lozano-Pérez,et al.  Learning Compact Models for Planning with Exogenous Processes , 2019, CoRL.

[38]  Stefan Edelkamp,et al.  Automated Planning: Theory and Practice , 2007, Künstliche Intell..

[39]  Fangkai Yang,et al.  PEORL: Integrating Symbolic Planning and Hierarchical Reinforcement Learning for Robust Decision-Making , 2018, IJCAI.

[40]  Peter Stone,et al.  Open-World Reasoning for Service Robots , 2019, ICAPS.

[41]  Michael Gelfond,et al.  A Refinement-Based Architecture for Knowledge Representation and Reasoning in Robotics , 2015, J. Artif. Intell. Res..

[42]  Murray Shanahan,et al.  Towards Deep Symbolic Reinforcement Learning , 2016, ArXiv.

[43]  Marc Hanheide,et al.  Robot task planning and explanation in open and uncertain worlds , 2017, Artif. Intell..

[44]  Esra Erdem,et al.  Finding optimal feasible global plans for multiple teams of heterogeneous robots using hybrid reasoning: an application to cognitive factories , 2019, Auton. Robots.

[45]  Matthias Scheutz,et al.  A Logic-Based Computational Framework for Inferring Cognitive Affordances , 2018, IEEE Transactions on Cognitive and Developmental Systems.

[46]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[47]  Jan Peters,et al.  Utilizing Human Feedback in POMDP Execution and Specification , 2018, 2018 IEEE-RAS 18th International Conference on Humanoid Robots (Humanoids).

[48]  Xiaoping Chen,et al.  Robot Representing and Reasoning with Knowledge from Reinforcement Learning , 2018, ArXiv.

[49]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[50]  Peter Stone,et al.  CORPP: Commonsense Reasoning and Probabilistic Planning, as Applied to Dialog with a Mobile Robot , 2015, AAAI.

[51]  Fangkai Yang,et al.  SDRL: Interpretable and Data-efficient Deep Reinforcement Learning Leveraging Symbolic Planning , 2018, AAAI.

[52]  Luc De Raedt,et al.  Inference and learning in probabilistic logic programs using weighted Boolean formulas , 2013, Theory and Practice of Logic Programming.

[53]  Belgium , 1930 .

[54]  Peter Stone,et al.  iCORPP: Interleaved Commonsense Reasoning and Probabilistic Planning on Robots , 2020, ArXiv.

[55]  Michael Gelfond,et al.  Knowledge Representation, Reasoning, and the Design of Intelligent Agents: Creating a Knowledge Base , 2014 .

[56]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[57]  Alberto Camacho,et al.  Strong Fully Observable Non-Deterministic Planning with LTL and LTLf Goals , 2019, IJCAI.

[58]  Michael Gelfond,et al.  Some properties of system descriptions of , 2013, J. Appl. Non Class. Logics.

[59]  Thomas J. Bergin,et al.  History of programming languages---II , 1996 .

[60]  Sheila A. McIlraith,et al.  Symbolic Plans as High-Level Instructions for Reinforcement Learning , 2020, ICAPS.

[61]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[62]  Giuseppe De Giacomo,et al.  Abstraction of Agents Executing Online and their Abilities in the Situation Calculus , 2018, IJCAI.

[63]  Fangzhen Lin,et al.  Faster and Safer Training by Embedding High-Level Knowledge into Deep Reinforcement Learning , 2019, ArXiv.

[64]  Michael Wooldridge,et al.  Proceedings of the 24th International Conference on Artificial Intelligence , 2015 .

[65]  Pat Langley,et al.  A Unified Cognitive Architecture for Physical Agents , 2006, AAAI.

[66]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[67]  David Vernon,et al.  The role of cognitive architectures in general artificial intelligence , 2018, Cognitive Systems Research.

[68]  V. S. Costa,et al.  Theory and Practice of Logic Programming , 2010 .

[69]  Jesse Thomason,et al.  Augmenting Knowledge through Statistical, Goal-oriented Human-Robot Dialog , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[70]  C. Lebiere,et al.  The Atomic Components of Thought , 1998 .

[71]  Leslie Pack Kaelbling,et al.  Integrating Human-Provided Information into Belief State Representation Using Dynamic Factorization , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[72]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[73]  F. Pfenning Theory and Practice of Logic Programming , 2014 .

[74]  Pedro M. Domingos,et al.  A Language for Relational Decision Theory , 2009 .

[75]  Pascal Poupart,et al.  Unsupervised Video Object Segmentation for Deep Reinforcement Learning , 2018, NeurIPS.

[76]  Joyce Yue Chai,et al.  Interactive Learning of Grounded Verb Semantics towards Human-Robot Communication , 2017, ACL.

[77]  Manuela M. Veloso,et al.  An interactive approach for situated task specification through verbal instructions , 2014, AAMAS.

[78]  Richard Fikes,et al.  STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving , 1971, IJCAI.

[79]  Giuseppe De Giacomo History of Programming Languages , 2006 .

[80]  H. Silfverhielm,et al.  Sweden , 1996, The Lancet.

[81]  Brendan Juba,et al.  Integrated Common Sense Learning and Planning in POMDPs , 2016, J. Mach. Learn. Res..

[82]  Fangkai Yang,et al.  Task-Motion Planning with Reinforcement Learning for Adaptable Mobile Service Robots , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[83]  Christian J. Muise,et al.  Logical Filtering and Smoothing: State Estimation in Partially Observable Domains , 2017, AAAI.

[84]  Leslie Pack Kaelbling,et al.  From Skills to Symbols: Learning Symbolic Representations for Abstract High-Level Planning , 2018, J. Artif. Intell. Res..

[85]  Peter Stone,et al.  Learning to Interpret Natural Language Commands through Human-Robot Dialog , 2015, IJCAI.

[86]  Valentin Goranko,et al.  Logic in Computer Science: Modelling and Reasoning About Systems , 2007, J. Log. Lang. Inf..

[87]  Michael Gelfond,et al.  Knowledge Representation, Reasoning, and the Design of Intelligent Agents: The Answer-Set Programming Approach , 2014 .

[88]  Joseph Y. Halpern Reasoning about uncertainty , 2003 .

[89]  Daniel Kudenko,et al.  Using plan-based reward shaping to learn strategies in StarCraft: Broodwar , 2013, 2013 IEEE Conference on Computational Inteligence in Games (CIG).

[90]  Mohammad Shokrolah Shirazi,et al.  Robot Sequential Decision Making using LSTM-based Learning and Logical-probabilistic Reasoning , 2019, ArXiv.

[91]  C. R. Ramakrishnan,et al.  Model checking with probabilistic tabled logic programming , 2012, Theory and Practice of Logic Programming.

[92]  David Poole,et al.  Abducing through negation as failure: stable models within the independent choice logic , 2000, J. Log. Program..

[93]  Stefanie Tellex,et al.  Simultaneously Learning Transferable Symbols and Language Groundings from Perceptual Data for Instruction Following , 2020, Robotics: Science and Systems.

[94]  Malte Helmert,et al.  The Fast Downward Planning System , 2006, J. Artif. Intell. Res..

[95]  Mohan Sridharan,et al.  What Can I Not Do? Towards an Architecture for Reasoning about and Learning Affordances , 2017, ICAPS.

[96]  Alberto Camacho,et al.  LTL and Beyond: Formal Languages for Reward Function Specification in Reinforcement Learning , 2019, IJCAI.

[97]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[98]  I. Campbell,et al.  Volume 30 , 2002 .

[99]  Carina Silberer,et al.  Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , 2013 .