Ax optimal policy for sampling from uncertain distributions

It is known [see, for example, Ross (1970)j that, if the distribution of the offers is known, then the optimal stopping policy when sampling with recall is to stop and accept the current offer if the cost of continuing one more step is greater than or equal to the expected gain of continuing one"more step. This paper investigates a sampling with recall problem in which the distribution of the offers is unknown. Our approach uses a Bayesian adaptation of techniques as presented by Ross (1970).