The Platform Design Problem

On-line firms deploy suites of software platforms, where each platform is designed to interact with users during a certain activity, such as browsing, chatting, socializing, emailing, driving, etc. The economic and incentive structure of this exchange, as well as its algorithmic nature, have not been explored to our knowledge; we initiate their study in this paper. We model this interaction as a Stackelberg game between a Designer and one or more Agents. We model an Agent as a Markov chain whose states are activities; we assume that the Agent's utility is a linear function of the steady-state distribution of this chain. The Designer may design a platform for each of these activities/states; if a platform is adopted by the Agent, the transition probabilities of the Markov chain are affected, and so is the objective of the Agent. The Designer's utility is a linear function of the steady state probabilities of the accessible states (that is, the ones for which the platform has been adopted), minus the development cost of the platforms. The underlying optimization problem of the Agent -- that is, how to choose the states for which to adopt the platform -- is an MDP. If this MDP has a simple yet plausible structure (the transition probabilities from one state to another only depend on the target state and the recurrent probability of the current state) the Agent's problem can be solved by a greedy algorithm. The Designer's optimization problem (designing a custom suite for the Agent so as to optimize, through the Agent's optimum reaction, the Designer's revenue), while NP-hard, has an FPTAS. These results generalize, under mild additional assumptions, from a single Agent to a distribution of Agents with finite support. The Designer's optimization problem has abysmal "price of robustness", suggesting that learning the parameters of the problem is crucial for the Designer.

[1]  Morteza Zadimoghaddam,et al.  Efficiently Learning from Revealed Preference , 2012, WINE.

[2]  Jon M. Kleinberg,et al.  A Microeconomic View of Data Mining , 1998, Data Mining and Knowledge Discovery.

[3]  Jon M. Kleinberg,et al.  Segmentation problems , 2004, JACM.

[4]  Laurent El Ghaoui,et al.  Robust Optimization , 2021, ICORES.

[5]  Eric Balkanski,et al.  The limitations of optimization from samples , 2015, STOC.

[6]  C. Papadimitriou,et al.  On the value of private information , 2001 .

[7]  Toniann Pitassi,et al.  Fairness through awareness , 2011, ITCS '12.

[8]  Christos H. Papadimitriou,et al.  Wealth Inequality and the Price of Anarchy , 2018, STACS.

[9]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[10]  Kira Goldner,et al.  Mechanism design for social good , 2018, SIGAI.

[11]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[12]  Yang Liu,et al.  Incentivizing High Quality User Contributions: New Arm Generation in Bandit Learning , 2018, AAAI.

[13]  John N. Tsitsiklis,et al.  Private Sequential Learning , 2018, COLT.

[14]  Munther A. Dahleh,et al.  A Marketplace for Data: An Algorithmic Solution , 2018, EC.

[15]  John N. Tsitsiklis,et al.  Delay-Predictability Trade-offs in Reaching a Secret Goal , 2018, Oper. Res..

[16]  Jon M. Kleinberg,et al.  Incentivizing exploration , 2014, EC.

[17]  Oscar H. Ibarra,et al.  Fast Approximation Algorithms for the Knapsack and Sum of Subset Problems , 1975, JACM.

[18]  Yishay Mansour,et al.  Bayesian Incentive-Compatible Bandit Exploration , 2018 .

[19]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[20]  Melvyn Sim,et al.  The Price of Robustness , 2004, Oper. Res..

[21]  Tim Roughgarden,et al.  Minimizing Regret with Multiple Reserves , 2016, EC.

[22]  R. Myerson Mechanism Design by an Informed Principal , 1983 .

[23]  L. Hurwicz,et al.  ON THE STABILITY OF THE COMPETITIVE EQUILIBRIUM, I1 , 1958 .

[24]  Éva Tardos,et al.  Learning in Games: Robustness of Fast Convergence , 2016, NIPS.

[25]  Eric Balkanski,et al.  The Sample Complexity of Optimizing a Convex Function , 2017, COLT.

[26]  Thodoris Lykouris,et al.  Graph regret bounds for Thompson Sampling and UCB , 2019, ALT.