Bandits with Global Convex Constraints and Objective
暂无分享,去创建一个
Multiarmed bandit (MAB) is a classic model for capturing the exploration–exploitation trade-off inherent in many sequential decision-making problems. The classic MAB framework, however, only allows...