Bandits with Global Convex Constraints and Objective

Multiarmed bandit (MAB) is a classic model for capturing the exploration–exploitation trade-off inherent in many sequential decision-making problems. The classic MAB framework, however, only allows...