Strategic Experimentation: The Case of Poisson Bandits

This paper studies a game of strategic experimentation in which the players have access to two-armed bandits where the risky arm distributes lumpsum payoffs according to a Poisson process with unknown intensity. Because of free-riding, there is an inefficiently low level of experimentation in any equilibrium where the players use stationary Markovian strategies. We characterize the unique symmetric Markovian equilibrium of the game, which is in mixed strategies. A variety of asymmetric pure-strategy equilibria is then constructed for the special case where there are two players and the arrival of the first lump-sum fully reveals the quality of the risky arm. Equilibria where players switch finitely often between the roles of experimenter and free-rider all lead to the same pattern of information acquisition; the efficiency of these equilibria depends on the way players share the burden of experimentation among them. We show that at least for relatively pessimistic beliefs, even the worst asymmetric equilibrium is more efficient than the symmetric one. In equilibria where players switch roles infinitely often, they can acquire an approximately efficient amount of information, but the rate at which it is acquired still remains inefficient.