论文信息 - Reinforcement Learning with Fairness Constraints for Resource Distribution in Human-Robot Teams

Reinforcement Learning with Fairness Constraints for Resource Distribution in Human-Robot Teams

Much work in robotics and operations research has focused on optimal resource distribution, where an agent dynamically decides how to sequentially distribute resources among different candidates. However, most work ignores the notion of fairness in candidate selection. In the case where a robot distributes resources to human team members, disproportionately favoring the highest performing teammate can have negative effects in team dynamics and system acceptance. We introduce a multi-armed bandit algorithm with fairness constraints, where a robot distributes resources to human teammates of different skill levels. In this problem, the robot does not know the skill level of each human teammate, but learns it by observing their performance over time. We define fairness as a constraint on the minimum rate that each human teammate is selected throughout the task. We provide theoretical guarantees on performance and perform a large-scale user study, where we adjust the level of fairness in our algorithm. Results show that fairness in resource distribution has a significant effect on users' trust in the system.

[1] Min Kyung Lee. Algorithmic Mediation in Group Decisions: Fairness Perceptions of Algorithmically Mediated vs. Discussion-Based Social Division , 2017, CSCW.

[2] P. V. Lange,et al. The pursuit of joint outcomes and equality in outcomes: An integrative model of social value orientation. , 1999 .

[3] P. Sweeney,et al. Distributive and procedural justice as predictors of satisfaction with personal and organizational outcomes. , 1992 .

[4] ปิยดา สมบัติวัฒนา. Behavioral Game Theory: Experiments in Strategic Interaction , 2013 .

[5] Siddhartha S. Srinivasa,et al. Planning with Trust for Human-Robot Collaboration , 2018, 2018 13th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[6] Ashutosh Sabharwal,et al. An Axiomatic Theory of Fairness in Network Resource Allocation , 2009, 2010 Proceedings IEEE INFOCOM.

[7] Minoru Asada,et al. Initiative in robot assistance during collaborative task execution , 2016, 2016 11th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[8] E. Fehr,et al. Psychological Foundations of Incentives , 2002 .

[9] R. Folger,et al. RETALIATION IN THE WORKPLACE: THE ROLES OF DISTRIBUTIVE, PROCEDURAL, AND INTERACTIONAL JUSTICE , 1997 .

[10] Jason L. Loeppky,et al. A Survey of Online Experiment Design with the Stochastic Multi-Armed Bandit , 2015, ArXiv.

[11] John Langford,et al. Efficient Optimal Learning for Contextual Bandits , 2011, UAI.

[12] Min Chen,et al. The Transfer of Human Trust in Robot Capabilities across Tasks , 2018, Robotics: Science and Systems.

[13] Richard,et al. Motivation through the Design of Work: Test of a Theory. , 1976 .

[14] Wei Chu,et al. A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[15] Julie A. Shah,et al. Coordination of Human-Robot Teaming with Human Task Preferences , 2015, AAAI Fall Symposia.

[16] V. Groom,et al. Can robots be teammates?: Benchmarks in human–robot teams , 2007 .

[17] Cynthia Breazeal,et al. Improved human-robot team performance using Chaski, A human-inspired plan execution system , 2011, 2011 6th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[18] Aaron Roth,et al. Fairness in Learning: Classic and Contextual Bandits , 2016, NIPS.

[19] David Hsu,et al. Multi-task trust transfer for human–robot interaction , 2020, Int. J. Robotics Res..

[20] Jia Liu,et al. Combinatorial Sleeping Bandits with Fairness Constraints , 2019, IEEE INFOCOM 2019 - IEEE Conference on Computer Communications.

[21] Anca D. Dragan,et al. Human-AI Learning Performance in Multi-Armed Bandits , 2018, AIES.

[22] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[23] Aurélien Garivier,et al. On Upper-Confidence Bound Policies for Non-Stationary Bandit Problems , 2008, 0805.3415.

[24] Maya Cakmak,et al. Adaptive Coordination Strategies for Human-Robot Handovers , 2015, Robotics: Science and Systems.

[25] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 1985 .

[26] Eli Upfal,et al. Multi-Armed Bandits in Metric Spaces ∗ , 2008 .

[27] Rémi Munos,et al. A Finite-Time Analysis of Multi-armed Bandits Problems with Kullback-Leibler Divergences , 2011, COLT.

[28] Min Kyung Lee. Understanding perception of algorithmic decisions: Fairness, trust, and emotion in response to algorithmic management , 2018, Big Data Soc..

[29] D. Meyer,et al. Supporting Online Material Materials and Methods Som Text Figs. S1 to S6 References Evidence for a Collective Intelligence Factor in the Performance of Human Groups , 2022 .

[30] Colin Camerer. Behavioral Game Theory: Experiments in Strategic Interaction , 2003 .

[31] Omar Besbes,et al. Stochastic Multi-Armed-Bandit Problem with Non-stationary Rewards , 2014, NIPS.

[32] E. Fehr,et al. Psychological Foundations of Incentives , 2002 .

[33] Siddhartha S. Srinivasa,et al. The Assistive Multi-Armed Bandit , 2019, 2019 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI).