Ordered Preference Elicitation Strategies for Supporting Multi-Objective Decision Making

In multi-objective decision planning and learning, much attention is paid to producing optimal solution sets that contain an optimal policy for every possible user preference profile. We argue that the step that follows, i.e, determining which policy to execute by maximising the user's intrinsic utility function over this (possibly infinite) set, is under-studied. This paper aims to fill this gap. We build on previous work on Gaussian processes and pairwise comparisons for preference modelling, extend it to the multi-objective decision support scenario, and propose new ordered preference elicitation strategies based on ranking and clustering. Our main contribution is an in-depth evaluation of these strategies using computer and human-based experiments. We show that our proposed elicitation strategies outperform the currently used pairwise methods, and found that users prefer ranking most. Our experiments further show that utilising monotonicity information in GPs by using a linear prior mean at the start and virtual comparisons to the nadir and ideal points, increases performance. We demonstrate our decision support framework in a real-world study on traffic regulation, conducted with the city of Amsterdam.

[1]  Nando de Freitas,et al.  Active Preference Learning with Discrete Choice Data , 2007, NIPS.

[2]  Clemens Thielen,et al.  A General Approximation Method for Bicriteria Minimization Problems , 2017, Theor. Comput. Sci..

[3]  Michèle Sebag,et al.  Multi-objective Monte-Carlo Tree Search , 2012, ACML.

[4]  Carl E. Rasmussen,et al.  Gaussian Processes for Data-Efficient Learning in Robotics and Control , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Diederik M. Roijers Multi-objective decision-theoretic planning , 2016 .

[6]  Shimon Whiteson,et al.  Multi-Objective Deep Reinforcement Learning , 2016, ArXiv.

[7]  Scott Sanner,et al.  Gaussian Process Preference Elicitation , 2010, NIPS.

[8]  James M. Kates,et al.  Predicting Preference Judgments of Individual Normal and Hearing-Impaired Listeners With Gaussian Processes , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Shimon Whiteson,et al.  A Survey of Multi-Objective Sequential Decision-Making , 2013, J. Artif. Intell. Res..

[10]  Ann Nowé,et al.  Interactive Thompson Sampling for Multi-objective Multi-armed Bandits , 2017, ADT.

[11]  Daphne Koller,et al.  Making Rational Decisions Using Adaptive Utility Elicitation , 2000, AAAI/IAAI.

[12]  Ann Nowé,et al.  Multi-objective reinforcement learning using sets of pareto dominating policies , 2014, J. Mach. Learn. Res..

[13]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[14]  Gary B. Lamont,et al.  Evolutionary Algorithms for Solving Multi-Objective Problems , 2002, Genetic Algorithms and Evolutionary Computation.

[15]  Jan Larsen,et al.  Efficient preference learning with pairwise continuous observations and Gaussian Processes , 2011, 2011 IEEE International Workshop on Machine Learning for Signal Processing.

[16]  Shlomo Zilberstein,et al.  Multi-Objective MDPs with Conditional Lexicographic Reward Preferences , 2015, AAAI.

[17]  Filip Radlinski,et al.  Towards Conversational Recommender Systems , 2016, KDD.

[18]  Ercan Sirakaya,et al.  THE ROLE OF MOOD ON TOURISM PRODUCT EVALUATIONS , 2004 .

[19]  S. Siegel,et al.  Nonparametric Statistics for the Behavioral Sciences , 2022, The SAGE Encyclopedia of Research Design.

[20]  Theodor J. Stewart,et al.  Multiple Criteria Decision Analysis , 2001 .

[21]  Susan A. Murphy,et al.  Linear fitted-Q iteration with multiple reward functions , 2013, J. Mach. Learn. Res..

[22]  Marcello Restelli,et al.  Multi-objective Reinforcement Learning through Continuous Pareto Manifold Approximation , 2016, J. Artif. Intell. Res..

[23]  Frans A. Oliehoek,et al.  Quality Assessment of MORL Algorithms: A Utility-Based Approach , 2015 .

[24]  Nando de Freitas,et al.  A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning , 2010, ArXiv.

[25]  Catholijn M. Jonker,et al.  An agent architecture for multi-attribute negotiation using incomplete preference information , 2007, Autonomous Agents and Multi-Agent Systems.

[26]  Marco Wiering,et al.  Model-based multi-objective reinforcement learning , 2014, 2014 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).

[27]  Craig Boutilier,et al.  A POMDP formulation of preference elicitation problems , 2002, AAAI/IAAI.

[28]  Sean M. McNee,et al.  Getting to know you: learning new user preferences in recommender systems , 2002, IUI '02.

[29]  Gerald Tesauro,et al.  Connectionist Learning of Expert Preferences by Comparison Training , 1988, NIPS.

[30]  Peter Auer,et al.  Pareto Front Identification from Stochastic Bandit Feedback , 2016, AISTATS.

[31]  Michèle Sebag,et al.  Hypervolume indicator and dominance reward based multi-objective Monte-Carlo Tree Search , 2013, Machine Learning.

[32]  Nando de Freitas,et al.  Preference galleries for material design , 2007, SIGGRAPH '07.

[33]  Zoubin Ghahramani,et al.  Collaborative Gaussian Processes for Preference Learning , 2012, NIPS.

[34]  Wei Chu,et al.  Preference learning with Gaussian processes , 2005, ICML.

[35]  Robert T. Clemen,et al.  Making Hard Decisions with DecisionTools , 2013 .

[36]  M. de Rijke,et al.  Relative Upper Confidence Bound for the K-Armed Dueling Bandit Problem , 2013, ICML.

[37]  Sandip Sen,et al.  An automated meeting scheduling system that utilizes user preferences , 1997, AGENTS '97.

[38]  Sébastien Vérel,et al.  A Fitness Landscape Analysis of Pareto Local Search on Bi-objective Permutation Flowshop Scheduling Problems , 2017, EMO.

[39]  Gediminas Adomavicius,et al.  Context-aware recommender systems , 2008, RecSys '08.

[40]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[41]  Srini Narayanan,et al.  Learning all optimal policies with multiple criteria , 2008, ICML '08.

[42]  D. Lizotte Practical bayesian optimization , 2008 .

[43]  M.A. Wiering,et al.  Computing Optimal Stationary Policies for Multi-Objective Markov Decision Processes , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[44]  Jan Larsen,et al.  Predictive Modeling of Expressed Emotions in Music Using Pairwise Comparisons , 2012, CMMR.

[45]  Andrei V. Kelarev,et al.  Constructing Stochastic Mixture Policies for Episodic Multiobjective Reinforcement Learning Tasks , 2009, Australasian Conference on Artificial Intelligence.

[46]  Shimon Whiteson,et al.  Multi-Objective Decision Making , 2017, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[47]  Marco Laumanns,et al.  Scalable Test Problems for Evolutionary Multiobjective Optimization , 2005, Evolutionary Multiobjective Optimization.

[48]  Catholijn M. Jonker,et al.  Designing interfaces for explicit preference elicitation: a user-centered investigation of preference representation and elicitation process , 2011, User Modeling and User-Adapted Interaction.

[49]  J. Forgas Mood and judgment: the affect infusion model (AIM). , 1995, Psychological bulletin.

[50]  Eyke Hüllermeier,et al.  Preference Learning: An Introduction , 2010, Preference Learning.

[51]  Jim Duggan,et al.  A Theoretical and Empirical Analysis of Reward Transformations in Multi-Objective Stochastic Games , 2017, AAMAS.

[52]  Patrice Perny,et al.  Adaptive Elicitation of Preferences under Uncertainty in Sequential Decision Making Problems , 2017, IJCAI.