Gaussian multi-armed bandit problems with multiple objectives

Motivated by the goal of formally integrating human designers into computational systems for engineering design optimization, I study decision making under uncertainty with multiple objectives in the context of the multi-armed bandit problem. A key aspect of multi-objective optimization is the need for scalarization, i.e., a way to combine the various objectives into a single well-defined scalar objective function. I study the case where the multi-objective rewards are Gaussian distributed and the scalarization is linear and develop an algorithm that achieves optimal performance, i.e., converges to selecting the best arm at the highest possible rate.

[1]  Paul B. Reverdy Modeling Human Decision-making in Multi-armed Bandits , 2013 .

[2]  H. Robbins Some aspects of the sequential design of experiments , 1952 .

[3]  J. L. Nolan Stable Distributions. Models for Heavy Tailed Data , 2001 .

[4]  Csaba Szepesvári,et al.  –armed Bandits , 2022 .

[5]  Paul B. Reverdy,et al.  Human-inspired algorithms for search A framework for human-machine multi-armed bandit problems , 2014 .

[6]  Joaquim R. R. A. Martins,et al.  Multidisciplinary design optimization: A survey of architectures , 2013 .

[7]  Peter Vrancx,et al.  Multi-objective χ-Armed bandits , 2014, 2014 International Joint Conference on Neural Networks (IJCNN).

[8]  Vaibhav Srivastava,et al.  Modeling Human Decision Making in Generalized Gaussian Multiarmed Bandits , 2013, Proceedings of the IEEE.

[9]  Aleksandrs Slivkins,et al.  Sharp dichotomies for regret minimization in metric spaces , 2009, SODA '10.

[10]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[11]  Ann Nowé,et al.  Designing multi-objective multi-armed bandits algorithms: A study , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[12]  M. Abramowitz,et al.  Handbook of Mathematical Functions With Formulas, Graphs and Mathematical Tables (National Bureau of Standards Applied Mathematics Series No. 55) , 1965 .

[13]  S. Kay Fundamentals of statistical signal processing: estimation theory , 1993 .

[14]  Steven Kay,et al.  Fundamentals Of Statistical Signal Processing , 2001 .

[15]  T. L. Lai Andherbertrobbins Asymptotically Efficient Adaptive Allocation Rules , 1985 .

[16]  R. L. Keeney,et al.  Decisions with Multiple Objectives: Preferences and Value Trade-Offs , 1977, IEEE Transactions on Systems, Man, and Cybernetics.

[17]  Naomi Ehrich Leonard,et al.  Integrating a human designer's preferences in multidisciplinary design optimization , 2014 .

[18]  Ralph L. Keeney,et al.  Decision Analysis with Multiple Conflicting Objectives, Preferences and Value Tradeoffs , 1975 .