论文信息 - Economic Recommendation Systems: One Page Abstract

Economic Recommendation Systems: One Page Abstract

In the on-line Explore & Exploit [E&E] literature, central to Machine Learning, a central planner is faced with a set of alternatives, each yielding some unknown reward. The planner's goal is to learn the optimal alternative as soon as possible, via experimentation. A typical assumption in this model is that the planner has full control over the experiment design and implementation. When experiments are implemented by a society of self-motivated agents the planner can only recommend experimentation but has no power to enforce it. The first paper to marry the social aspects with the challenge of E&E, a new research domain for which we coin the term "social explore and exploit", is Kremer et. al. [Kremer et al. 2014]. In that work the authors introduce a naive setting (We use the notion of a "naive setting" for settings where the optimal non-social explore and exploit scheme is trivial - try all actions sequentially, each once, and settle on the optimal one thereafter) and study optimal explore and exploit schemes that account for agents' incentives. To be more specific, [Kremer et al. 2014] identify an incentive compatible scheme with which a central planner can asymptotically steer the users towards taking the optimal action. Whereas [Kremer et al. 2014] account for agents' incentives and in particular the misalignment of incentives of the agents and the planner it ignore other societal aspects. In particular, [Kremer et al. 2014] make an implicit assumption that agents cannot see nor communicate with any other agent. It turns out that, when observability is factored in, the scheme proposed by Kremer et. al. is no longer incentive compatible, leading to market failure. In this work we introduce observability into the framework of social E&E. We study the design of recommendation systems when agents can (partly) observe each other. In particular, we investigate the conditions on the social network which allow for asymptotically optimal outcomes. Thus, we extend [Kremer et al. 2014] by adding the additional layer of a social network and show conditions under which the essence of their results, albeit with a different mechanism, can still be maintained even though agents may observe each other. Intuitively, the more agents can see each other the less power resides within the central planner. Formally, let a visibility graph over N agents be a graph where agents serve as the nodes, and an edge (a; b) implies that agents a and b can observe each other's action. A visibility graph is an (αβ)-graph if the number of nodes with degree greater than Nα is bounded by Nβ. Our main result is that for a sufficiently large N, if the visibility graph is an (αβ)-graph, where 2α + β < 1, then there exists a deterministic incentive-compatible algorithm leading to approximately optimal outcome. On the other hand we show that for the complete graph asymptotically optimal outcome can not be obtained by any probabilistic incentive-compatible algorithm. As the complete graph is a (0; 1)-graph, our result is tight.

[1] Yishay Mansour,et al. Implementing the “Wisdom of the Crowd” , 2013, Journal of Political Economy.