Feedback-Guided Intention Scheduling for BDI Agents

Intelligent agents, like those based on the popular BDI agent paradigm, typically pursue multiple goals in parallel. An intention scheduler is required to reason about the possible interactions between the agent’s intentions to maximize some utility. An important consideration when scheduling intentions is the user’s preferences over the goals and the ways in which the goals are achieved. These preferences are generally unknown in advance, time-consuming to elicit, hard to model, and difficult to incorporate into an intention scheduler. In this paper, we present a Monte Carlo Tree Search based intention scheduler ( pref -MCTS) that is able to learn the user’s preferences over intention schedules via low-burden comparative-type queries. It incorporates the learned preferences in guiding the search, leading to execution policies that are optimized towards the user’s preferences and expectations. We evaluate our approach using an artificial oracle that shows that pref -MCTS improves over state-of-the-art baselines, even when provided with a limited number of preference queries and noisy labels. We also conducted a user study and showed that pref -MCTS is able to learn user preferences and generate schedules that are preferred by the users in real-time.

[1]  B. Logan,et al.  Multi-Agent Intention Progression with Reward Machines , 2022, IJCAI.

[2]  John Thangarajah,et al.  Multi-Agent Intention Progression with Black-Box Agents , 2021, IJCAI.

[3]  Xiaodong Li,et al.  Bayesian preference learning for interactive multi-objective optimisation , 2021, GECCO.

[4]  Persefoni Mitropoulou,et al.  Safety requirements for symbiotic human–robot collaboration systems in smart factories: a pairwise comparison approach to explore requirements dependencies , 2020, Requirements Engineering.

[5]  Ke Xu,et al.  Learning to Compare for Better Training and Evaluation of Open Domain Natural Language Generation Models , 2020, AAAI.

[6]  Shane Legg,et al.  Deep Reinforcement Learning from Human Preferences , 2017, NIPS.

[7]  Brian Logan,et al.  Action-Level Intention Selection for BDI Agents , 2016, AAMAS.

[8]  F. Dignum,et al.  Preference-based reasoning in BDI agent systems , 2016, Autonomous Agents and Multi-Agent Systems.

[9]  John Thangarajah,et al.  Robust Execution of BDI Agent Programs by Exploiting Synergies Between Intentions , 2016, AAAI.

[10]  Lin Padgham,et al.  Evaluating coverage based intention selection , 2014, AAMAS.

[11]  Sébastien Monnet,et al.  Matchmaking in multi-player on-line games: studying user traces to improve the user experience , 2014, NOSSDAV.

[12]  Jerrell Stracener,et al.  Systems requirements engineering—State of the methodology , 2013, Syst. Eng..

[13]  Jos W. H. M. Uiterwijk,et al.  Single-player Monte-Carlo tree search for SameGame , 2012, Knowl. Based Syst..

[14]  Lin Padgham,et al.  Measuring plan coverage and overlap for agent reasoning , 2012, AAMAS.

[15]  Simon M. Lucas,et al.  A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[16]  James Harland,et al.  Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence Reasoning About Preferences in Intelligent Agent Systems ∗ , 2022 .

[17]  Lin Padgham,et al.  Computationally Effective Reasoning About Goal Interactions , 2011, Journal of Automated Reasoning.

[18]  Jorge A. Baier,et al.  HTN Planning with Preferences , 2009, IJCAI.

[19]  Michael Wooldridge,et al.  Programming Multi-Agent Systems in AgentSpeak using Jason (Wiley Series in Agent Technology) , 2007 .

[20]  Tie-Yan Liu,et al.  Learning to rank: from pairwise approach to listwise approach , 2007, ICML '07.

[21]  Lin Padgham,et al.  A Comparison of BDI Based Real-Time Reasoning and HTN Based Planning , 2004, Australian Conference on Artificial Intelligence.

[22]  M. Winikoff,et al.  Detecting & Avoiding Interference Between Goals in Intelligent Agents , 2003, IJCAI.

[23]  Michael Winikoff,et al.  Detecting & exploiting positive goal interaction in intelligent agents , 2003, AAMAS '03.

[24]  Michael Winikoff,et al.  Avoiding Resource Conflicts in Intelligent Agents , 2002, ECAI.

[25]  Anand S. Rao,et al.  Modeling Rational Agents within a BDI-Architecture , 1997, KR.

[26]  Yann LeCun,et al.  Signature Verification Using A "Siamese" Time Delay Neural Network , 1993, Int. J. Pattern Recognit. Artif. Intell..

[27]  R. A. Bradley,et al.  RANK ANALYSIS OF INCOMPLETE BLOCK DESIGNS THE METHOD OF PAIRED COMPARISONS , 1952 .

[28]  John Thangarajah,et al.  Intention-Aware Multiagent Scheduling , 2020, AAMAS.

[29]  Gregory R. Koch,et al.  Siamese Neural Networks for One-Shot Image Recognition , 2015 .

[30]  Rémi Coulom,et al.  Computing "Elo Ratings" of Move Patterns in the Game of Go , 2007, J. Int. Comput. Games Assoc..

[31]  Michael Winikoff,et al.  JACKTM Intelligent Agents: An Industrial Strength Platform , 2005, Multi-Agent Programming.

[32]  Rafael H. Bordini,et al.  Multi-Agent Programming: Languages, Platforms and Applications , 2005, Multi-Agent Programming.

[33]  Winfried Lamersdorf,et al.  Jadex: A BDI Reasoning Engine , 2005, Multi-Agent Programming.

[34]  Jorge J. Gómez-Sanz,et al.  Programming Multi-Agent Systems , 2003, Lecture Notes in Computer Science.

[35]  Gerald Tesauro,et al.  Connectionist Learning of Expert Preferences by Comparison Training , 1988, NIPS.

[36]  R. A. Bradley,et al.  RANK ANALYSIS OF INCOMPLETE BLOCK DESIGNS , 1952 .