论文信息 - A User Comfort Model and Index Policy for Personalizing Discrete Controller Decisions

A User Comfort Model and Index Policy for Personalizing Discrete Controller Decisions

User feedback allows for tailoring system operation to ensure individual user satisfaction. A major challenge in personalized decision-making is the systematic construction of a user model during operation while maintaining control performance. This paper presents both an index-based control policy to smartly collect and process user feedback and a user comfort model in the form of a Markov decision process with a priori unknown user-specific state transition probabilities. The control policy utilizes explicit user feedback to optimize a reward measure reflecting user comfort and addresses the explorationexploitation trade-off in a multi-armed bandit framework. The proposed approach combines restless bandits and upper confidence bound algorithms. It introduces an exploration term into the restless bandit formulation, utilizes user feedback to identify the user model, and is shown to be indexable. We demonstrate its capabilities with a simulation for learning a user's trade-off between comfort and energy usage.

Melanie N. Zeilinger | Marcel Menner

[1] Martin Ester,et al. TrustWalker: a random walk model for combining trust-based and item-based recommendation , 2009, KDD.

[2] Volkan Cevher,et al. Time-Varying Gaussian Process Bandit Optimization , 2016, AISTATS.

[3] J. Gittins. Bandit processes and dynamic allocation indices , 1979 .

[4] P. Whittle. Restless bandits: activity allocation in a changing world , 1988, Journal of Applied Probability.

[5] R. Agrawal. Sample mean based index policies by O(log n) regret for the multi-armed bandit problem , 1995, Advances in Applied Probability.

[6] Joshua A. Taylor,et al. Index Policies for Demand Response , 2014, IEEE Transactions on Power Systems.

[7] Dan J. Kim,et al. A trust-based consumer decision-making model in electronic commerce: The role of trust, perceived risk, and their antecedents , 2019 .

[8] E. Feron,et al. Multi-UAV dynamic routing with partial observations using restless bandit allocation indices , 2008, 2008 American Control Conference.

[9] E. L. Lawler,et al. Branch-and-Bound Methods: A Survey , 1966, Oper. Res..

[10] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[11] Dimitris Bertsimas,et al. Restless Bandits, Linear Programming Relaxations, and a Primal-Dual Index Heuristic , 2000, Oper. Res..