论文信息 - Decision-Making Under Selective Labels: Optimal Finite-Domain Policies and Beyond

Decision-Making Under Selective Labels: Optimal Finite-Domain Policies and Beyond

Selective labels are a common feature of highstakes decision-making applications, referring to the lack of observed outcomes under one of the possible decisions. This paper studies the learning of decision policies in the face of selective labels, in an online setting that balances learning costs against future utility. In the homogeneous case in which individuals’ features are disregarded, the optimal decision policy is shown to be a threshold policy. The threshold becomes more stringent as more labels are collected; the rate at which this occurs is characterized. In the case of features drawn from a finite domain, the optimal policy consists of multiple homogeneous policies in parallel. For the general infinite-domain case, the homogeneous policy is extended by using a probabilistic classifier and bootstrapping to provide its inputs. In experiments on synthetic and real data, the proposed policies achieve consistently superior utility with no parameter tuning in the finite-domain case and lower parameter sensitivity in the general case.

Dennis Wei | Dennis Wei

[1] Thorsten Joachims,et al. Batch learning from logged bandit feedback through counterfactual risk minimization , 2015, J. Mach. Learn. Res..

[2] John Langford,et al. Doubly Robust Policy Evaluation and Learning , 2011, ICML.

[3] John Langford,et al. Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits , 2014, ICML.

[4] Jure Leskovec,et al. The Selective Labels Problem: Evaluating Algorithmic Predictions in the Presence of Unobservables , 2017, KDD.

[5] Alexandra Chouldechova,et al. Learning under selective labels in the presence of expert consistency , 2018, ArXiv.

[6] Aaron Roth,et al. Fairness in Learning: Classic and Contextual Bandits , 2016, NIPS.

[7] Csaba Szepesvári,et al. Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.

[8] Mingyan Liu,et al. How Do Fair Decisions Fare in Long-term Qualification? , 2020, NeurIPS.

[9] Dean Eckles,et al. Thompson sampling with the online bootstrap , 2014, ArXiv.

[10] Nathan Kallus,et al. Balanced Policy Evaluation and Learning , 2017, NeurIPS.

[11] Benjamin Van Roy,et al. Bootstrapped Thompson Sampling and Deep Exploration , 2015, ArXiv.