论文信息 - Strategic advice provision in repeated human-agent interactions - 字舞流文

Strategic advice provision in repeated human-agent interactions

This paper addresses the problem of automated advice provision in scenarios that involve repeated interactions between people and computer agents. This problem arises in many applications such as route selection systems, office assistants and climate control systems. To succeed in such settings agents must reason about how their advice influences people’s future actions or decisions over time. This work models such scenarios as a family of repeated bilateral interaction called “choice selection processes”, in which humans or computer agents may share certain goals, but are essentially self-interested. We propose a social agent for advice provision (SAP) for such environments that generates advice using a social utility function which weighs the sum of the individual utilities of both agent and human participants. The SAP agent models human choice selection using hyperbolic discounting and samples the model to infer the best weights for its social utility function. We demonstrate the effectiveness of SAP in two separate domains which vary in the complexity of modeling human behavior as well as the information that is available to people when they need to decide whether to accept the agent’s advice. In both of these domains, we evaluated SAP in extensive empirical studies involving hundreds of human subjects. SAP was compared to agents using alternative models of choice selection processes informed by behavioral economics and psychological models of decision-making. Our results show that in both domains, the SAP agent was able to outperform alternative models. This work demonstrates the efficacy of combining computational methods with behavioral economics to model how people reason about machine-generated advice and presents a general methodology for agent-design in such repeated advice settings.

Amos Azaria | Sarit Kraus | C. V. Goldman | Ya’akov Gal | Claudia V. Goldman | Sarit Kraus | C. Goldman | Y. Gal | A. Azaria

[1] Carlos Guestrin,et al. Multiagent Planning with Factored MDPs , 2001, NIPS.

[2] W. K. Hastings,et al. Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[3] Ram D. Gopal,et al. Empirical Analysis of the Impact of Recommender Systems on Sales , 2010, J. Manag. Inf. Syst..

[4] Colin Camerer. Behavioral Game Theory: Experiments in Strategic Interaction , 2003 .

[5] Gerry McGovern. Reestablishing the value of content , 2002, UBIQ.

[6] Claudia V. Goldman,et al. Optimizing information exchange in cooperative multi-agent systems , 2003, AAMAS '03.

[7] K. Arrow,et al. The New Palgrave Dictionary of Economics , 2020 .

[8] Ya'akov Gal,et al. An Adaptive Agent for Negotiating with People in Different Cultures , 2011, TIST.

[9] J E Lisman,et al. Storage of 7 +/- 2 short-term memories in oscillatory subcycles , 1995, Science.

[10] Craig Boutilier,et al. Optimal Set Recommendations Based on Regret , 2009, ITWP.

[11] Avshalom Elmalech,et al. When Suboptimal Rules , 2015, AAAI.

[12] Amos Azaria,et al. Giving Advice to People in Path Selection Problems , 2012, Interactive Decision Theory and Game Theory.

[13] Kevin Waugh,et al. Monte Carlo Sampling for Regret Minimization in Extensive Games , 2009, NIPS.

[14] Guy Shani,et al. An MDP-Based Recommender System , 2002, J. Mach. Learn. Res..

[15] Sarit Kraus,et al. Resolving crises through automated bilateral negotiations , 2008, Artif. Intell..

[16] Alfred Kobsa,et al. User Models in Dialog Systems , 1989, Symbolic Computation.

[17] J. Sobel,et al. STRATEGIC INFORMATION TRANSMISSION , 1982 .

[18] Claire Mathieu,et al. Maximizing profit using recommender systems , 2009, ArXiv.

[19] Ariel Rubinstein,et al. A study in the pragmatics of persuasion: a game theoretical approach , 2006 .

[20] G. A. Miller. THE PSYCHOLOGICAL REVIEW THE MAGICAL NUMBER SEVEN, PLUS OR MINUS TWO: SOME LIMITS ON OUR CAPACITY FOR PROCESSING INFORMATION 1 , 1956 .

[21] Michael Rovatsos,et al. Advice taking in multiagent reinforcement learning , 2007, AAMAS '07.

[22] James A. Landay,et al. The design of eco-feedback technology , 2010, CHI.

[23] Ya'akov Gal,et al. Plan Recognition and Visualization in Exploratory Learning Environments , 2013, TIIS.

[24] S. Bonaccio,et al. Advice taking and decision-making: An integrative literature review, and implications for the organizational sciences , 2006 .

[25] Lior Rokach,et al. Recommender Systems Handbook , 2010 .

[26] J. Stevens. Intertemporal Choice , 2013 .

[27] Avshalom Elmalech,et al. Less is more: restructuring decisions to improve agent search , 2011, AAMAS.

[28] Itai Sher,et al. Credibility and determinism in a game of persuasion , 2011, Games Econ. Behav..

[29] Brian Magerko,et al. Coralog: use-aware visualization connecting human micro-activities to environmental change , 2009, CHI Extended Abstracts.

[30] Yaniv,et al. Advice Taking in Decision Making: Egocentric Discounting and Reputation Formation. , 2000, Organizational behavior and human decision processes.

[31] Koen V. Hindriks,et al. Negotiating Agents , 2012, AI Mag..

[32] Philip A. Haile,et al. On the Empirical Content of Quantal Response Equilibrium , 2003 .

[33] Angus Deaton,et al. Intertemporal Choice and Inequality , 1993, Journal of Political Economy.

[34] Noah Gans,et al. Simple Models of Discrete Choice and Their Performance in Bandit Experiments , 2007, Manuf. Serv. Oper. Manag..

[35] Gediminas Adomavicius,et al. Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions , 2005, IEEE Transactions on Knowledge and Data Engineering.

[36] David Sarne,et al. Ordering Effects and Belief Adjustment in the Use of Comparison Shopping Agents , 2014, AAAI.

[37] Colin Camerer. Behavioral Game Theory , 1990 .

[38] Nicolas Vieille,et al. Dynamic Sender-Receiver Games , 2012 .

[39] Isabelle Bichindaritz,et al. Report on the Eighteenth International Conference on Case-Based Reasoning , 2012, AI Mag..

[40] B. J. Fogg,et al. Persuasive technology: using computers to change what we think and do , 2002, UBIQ.

[41] Long-Sheng Chen,et al. Developing recommender systems with the consideration of product profitability for sellers , 2008, Inf. Sci..

[42] Hesham Rakha,et al. ESTIMATING VEHICLE FUEL CONSUMPTION AND EMISSIONS BASED ON INSTANTANEOUS SPEED AND ACCELERATION LEVELS , 2002 .

[43] Milind Tambe,et al. A Fast Analytical Algorithm for Solving Markov Decision Processes with Real-Valued Resources , 2007, IJCAI.

[44] Ya'akov Gal,et al. A study of computational and human strategies in revelation games , 2014, Autonomous Agents and Multi-Agent Systems.

[45] Eric Paulos,et al. Home, habits, and energy: examining domestic interactions and energy consumption , 2010, CHI.

[46] Sarit Kraus,et al. To teach or not to teach?: decision making under uncertainty in ad hoc teams , 2010, AAMAS.

[47] ปิยดา สมบัติวัฒนา. Behavioral Game Theory: Experiments in Strategic Interaction , 2013 .

[48] S. Rosenberg,et al. The Image and the Vote: The Effect of Candidate Presentation on Jfbter Preference , 1986 .

[49] Amos Azaria,et al. Strategic Information Disclosure to People with Multiple Alternatives , 2011, AAAI.

[50] Zhengzhu Feng,et al. Dynamic Programming for Structured Continuous Markov Decision Problems , 2004, UAI.

[51] Dirk Ormoneit,et al. Kernel-Based Reinforcement Learning , 2017, Encyclopedia of Machine Learning and Data Mining.

[52] Eric Horvitz,et al. The Lumière Project: Bayesian User Modeling for Inferring the Goals and Needs of Software Users , 1998, UAI.

[53] Dane Petersen,et al. WattBot: a residential electricity monitoring and feedback system , 2009, CHI Extended Abstracts.

[54] Sarit Kraus,et al. Guiding User Choice During Discussion by Silence, Examples and Justifications , 2012, ECAI.

[55] Mehryar Mohri,et al. Multi-armed Bandit Algorithms and Empirical Evaluation , 2005, ECML.

[56] Nicolò Cesa-Bianchi,et al. Gambling in a rigged casino: The adversarial multi-armed bandit problem , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[57] KrausSarit,et al. Strategic advice provision in repeated human-agent interactions , 2016 .

[58] Amos Azaria,et al. Analyzing the Effectiveness of Adversary Modeling in Security Games , 2013, AAAI.

[59] Avshalom Elmalech,et al. Search More, Disclose Less , 2013, AAAI.

[60] N. Metropolis,et al. The Monte Carlo method. , 1949 .

[61] Joel Veness,et al. Monte-Carlo Planning in Large POMDPs , 2010, NIPS.

[62] Paul R. Milgrom,et al. Relying on the Information of Interested Parties , 1985 .

[63] Amos Azaria,et al. Movie recommender system for profit maximization , 2013, AAAI.