No-Regret and Incentive-Compatible Online Learning

We study online learning settings in which experts act strategically to maximize their influence on the learning algorithm's predictions by potentially misreporting their beliefs about a sequence of binary events. Our goal is twofold. First, we want the learning algorithm to be no-regret with respect to the best fixed expert in hindsight. Second, we want incentive compatibility, a guarantee that each expert's best strategy is to report his true beliefs about the realization of each event. To achieve this goal, we build on the literature on wagering mechanisms, a type of multi-agent scoring rule. We provide algorithms that achieve no regret and incentive compatibility for myopic experts for both the full and partial information settings. In experiments on datasets from FiveThirtyEight, our algorithms have regret comparable to classic no-regret algorithms, which are not incentive-compatible. Finally, we identify an incentive-compatible algorithm for forward-looking strategic agents that exhibits diminishing regret in practice.

[1]  L. J. Savage Elicitation of Personal Probabilities and Expectations , 1971 .

[2]  John Langford,et al.  An axiomatic characterization of wagering mechanisms , 2015, J. Econ. Theory.

[3]  Francesco Orabona,et al.  Coin Betting and Parameter-Free Online Learning , 2016, NIPS.

[4]  Philip E. Tetlock,et al.  Superforecasting: The Art and Science of Prediction , 2015 .

[5]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[6]  Mark D. Reid,et al.  Interpreting prediction markets: a stochastic approach , 2012, NIPS.

[7]  A. Raftery,et al.  Strictly Proper Scoring Rules, Prediction, and Estimation , 2007 .

[8]  John Langford,et al.  Self-financed wagering mechanisms for forecasting , 2008, EC '08.

[9]  Tim Roughgarden,et al.  Online Prediction with Selfish Experts , 2017, NIPS.

[10]  J McCarthy,et al.  MEASURES OF THE VALUE OF INFORMATION. , 1956, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[12]  David Haussler,et al.  How to use expert advice , 1993, STOC.

[13]  Vladimir Vovk,et al.  Aggregating strategies , 1990, COLT '90.

[14]  Vladimir Vovk,et al.  A game of prediction with expert advice , 1995, COLT '95.

[15]  Andreas Krause,et al.  Incentive-Compatible Forecasting Competitions , 2018, AAAI.

[16]  Jennifer Wortman Vaughan,et al.  Efficient Market Making via Convex Optimization, and a Connection to Online Learning , 2013, TEAC.

[17]  Mark D. Reid,et al.  Surrogate regret bounds for proper losses , 2009, ICML '09.

[18]  Manfred K. Warmuth,et al.  Averaging Expert Predictions , 1999, EuroCOLT.

[19]  Sébastien Bubeck,et al.  Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[20]  Yishay Mansour,et al.  Regret to the best vs. regret to the average , 2007, Machine Learning.

[21]  Sanjeev Arora,et al.  The Multiplicative Weights Update Method: a Meta-Algorithm and Applications , 2012, Theory Comput..

[22]  Jacob D. Abernethy,et al.  A Collaborative Mechanism for Crowdsourcing Prediction Problems , 2011, NIPS.

[23]  Manfred K. Warmuth,et al.  The Weighted Majority Algorithm , 1994, Inf. Comput..

[24]  Amos J. Storkey,et al.  Multi-period Trading Prediction Markets with Connections to Machine Learning , 2014, ICML.