Playing the matching-shoulders lob-pass game with logarithmic regret
暂无分享,去创建一个
The best previous algorithm for the matching shoulders lob-pass game, ARTHUR (Abe and Takeuchi 1993), suffered <italic>O</italic>(<italic>t</italic><supscrpt>1/2</supscrpt>) regret. We prove that this is the best possible performance for any algorithm that works by accurately estimating the opponent's payoff lines. Then we describe an algorithm which beats that bound and meets the information-theoretic lower bound of O(log<italic>t</italic>) regret by converging to the best lob rate <italic>without</italic> accurately estimating the payoff lines. The noise-tolerant binary search procedure that we develop is of independent interest.
[1] Joel H. Spencer,et al. Coping with Errors in Binary Search Procedures , 1980, J. Comput. Syst. Sci..
[2] Naoki Abe,et al. The “lob-pass” problem and an on-line learning model of rational choice , 1993, COLT '93.
[3] S. Rao Kosaraju,et al. Comparison-based search in the presence of errors , 1993, STOC.