论文信息 - Gaussian multi-armed bandit problems with multiple objectives

Gaussian multi-armed bandit problems with multiple objectives

Motivated by the goal of formally integrating human designers into computational systems for engineering design optimization, I study decision making under uncertainty with multiple objectives in the context of the multi-armed bandit problem. A key aspect of multi-objective optimization is the need for scalarization, i.e., a way to combine the various objectives into a single well-defined scalar objective function. I study the case where the multi-objective rewards are Gaussian distributed and the scalarization is linear and develop an algorithm that achieves optimal performance, i.e., converges to selecting the best arm at the highest possible rate.

Paul B. Reverdy | Paul Reverdy

[1] Paul B. Reverdy. Modeling Human Decision-making in Multi-armed Bandits , 2013 .

[2] H. Robbins. Some aspects of the sequential design of experiments , 1952 .

[3] J. L. Nolan. Stable Distributions. Models for Heavy Tailed Data , 2001 .

[4] Csaba Szepesvári,et al. –armed Bandits , 2022 .

[5] Paul B. Reverdy,et al. Human-inspired algorithms for search A framework for human-machine multi-armed bandit problems , 2014 .

[6] Joaquim R. R. A. Martins,et al. Multidisciplinary design optimization: A survey of architectures , 2013 .

[7] Peter Vrancx,et al. Multi-objective χ-Armed bandits , 2014, 2014 International Joint Conference on Neural Networks (IJCNN).

[8] Vaibhav Srivastava,et al. Modeling Human Decision Making in Generalized Gaussian Multiarmed Bandits , 2013, Proceedings of the IEEE.

[9] Aleksandrs Slivkins,et al. Sharp dichotomies for regret minimization in metric spaces , 2009, SODA '10.

[10] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[11] Ann Nowé,et al. Designing multi-objective multi-armed bandits algorithms: A study , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[12] M. Abramowitz,et al. Handbook of Mathematical Functions With Formulas, Graphs and Mathematical Tables (National Bureau of Standards Applied Mathematics Series No. 55) , 1965 .

[13] S. Kay. Fundamentals of statistical signal processing: estimation theory , 1993 .

[14] Steven Kay,et al. Fundamentals Of Statistical Signal Processing , 2001 .

[15] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 1985 .

[16] R. L. Keeney,et al. Decisions with Multiple Objectives: Preferences and Value Trade-Offs , 1977, IEEE Transactions on Systems, Man, and Cybernetics.

[17] Naomi Ehrich Leonard,et al. Integrating a human designer's preferences in multidisciplinary design optimization , 2014 .

[18] Ralph L. Keeney,et al. Decision Analysis with Multiple Conflicting Objectives, Preferences and Value Tradeoffs , 1975 .