Poisson Bandits of Evolving Shade of Gray

In the standard optimal stopping problems, actions are artificially restricted to the moments of observations of costs or benefits. In the standard experimentation and learning models based on two-armed Poisson bandits, it is possible to take an action between two sequential observations. The latter models do not recognize the fact that timing of decisions depends not only on the rate of arrival of observations, but also on the stochastic dynamics of costs or benefits. We combine together these two strands of literature and consider bandits of "evolving shade of gray" instead of two-armed bandits who are either "white knights" or "black villains." Stopping decisions in a model with Poisson bandits of "evolving shade of gray" are qualitatively different from those in optimal stopping or Poisson bandit models. We demonstrate that it may not be optimal to act immediately upon observation even if successes or failures are conclusive.

[1]  Huyên Pham,et al.  Optimal consumption policies in illiquid markets , 2011, Finance Stochastics.

[2]  Svetlana Boyarchenko,et al.  Irreversible Decisions under Uncertainty: Optimal Stopping Made Easy , 2010 .

[3]  D. Bergemann,et al.  Learning and Strategic Pricing , 1996 .

[4]  Svetlana Boyarchenko,et al.  Preemption Games Under Levy Uncertainty , 2011, Games Econ. Behav..

[5]  Frank Riedel,et al.  On Irreversible Investment , 2006 .

[6]  J. Liu,et al.  STOPPING AT THE MAXIMUM OF GEOMETRIC BROWNIAN MOTION WHEN SIGNALS ARE RECEIVED , 2005 .

[7]  Svetlana Boyarchenko,et al.  Optimal Stopping Made Easy , 2007 .

[8]  Ronnie Sircar,et al.  Exploration and exhaustibility in dynamic Cournot games , 2011, European Journal of Applied Mathematics.

[9]  Svetlana Boyarchenko,et al.  Practical Guide to Real Options in Discrete Time , 2004, cond-mat/0404106.

[10]  M. Weitzman,et al.  FUNDING CRITERIA FOR RESEARCH, DEVELOPMENT, AND EXPLORATION PROJECTS , 1981 .

[11]  Guy Barles,et al.  CRITICAL STOCK PRICE NEAR EXPIRATION , 1995 .

[12]  General Option Exercise Rules, with Applications to Embedded Options and Monopolistic Expansion , 2005 .

[13]  Jean-Paul Décamps,et al.  Investment timing and learning externalities , 2004, J. Econ. Theory.

[14]  D. Bergemann,et al.  The Financing of Innovation: Learning and Stopping , 2004 .

[15]  Dirk Bergemann,et al.  Dynamic Price Competition , 2003, J. Econ. Theory.

[16]  S. Boyarchenko Irreversible decisions and record-setting news principles , 2004 .

[17]  H. Kunreuther,et al.  Economics of Natural Catastrophe Risk Insurance , 2014 .

[18]  M. Rothschild A two-armed bandit theory of market pricing , 1974 .

[19]  Godfrey Keller,et al.  Strategic Experimentation with Poisson Bandits , 2009 .

[20]  M. Weitzman Optimal search for the best alternative , 1978 .

[21]  S. Levendorskii,et al.  Non-Gaussian Merton-Black-Scholes theory , 2002 .

[22]  D. Bergemann,et al.  Experimentation in Markets , 2000 .

[23]  S. Levendorskii,et al.  PRICING OF THE AMERICAN PUT UNDER LÉVY PROCESSES , 2004 .

[24]  Eduardo S. Schwartz,et al.  Investment Under Uncertainty. , 1994 .

[25]  Huyên Pham,et al.  A Model of Optimal Consumption under Liquidity Risk with Random Trading Times , 2006 .

[26]  R. C. Merton,et al.  Option pricing when underlying stock returns are discontinuous , 1976 .

[27]  Paul Dupuis,et al.  Optimal stopping with random intervention times , 2002, Advances in Applied Probability.

[28]  Alʹbert Nikolaevich Shiri︠a︡ev,et al.  Optimal Stopping and Free-Boundary Problems , 2006 .

[29]  S. Levendorskii,et al.  Perpetual American options under Levy processes , 2002, Proceedings of the 41st IEEE Conference on Decision and Control, 2002..

[30]  M. Cripps,et al.  Strategic Experimentation with Exponential Bandits , 2003 .

[31]  Sven Rady,et al.  Optimal Experimentation in a Changing Environment , 1997 .