Online Learning Using Only Peer Prediction

This paper considers a variant of the classical online learning problem with expert predictions. Our model's differences and challenges are due to lacking any direct feedback on the loss each expert incurs at each time step $t$. We propose an approach that uses peer prediction and identify conditions where it succeeds. Our techniques revolve around a carefully designed peer score function $s()$ that scores experts' predictions based on the peer consensus. We show a sufficient condition, that we call \emph{peer calibration}, under which standard online learning algorithms using loss feedback computed by the carefully crafted $s()$ have bounded regret with respect to the unrevealed ground truth values. We then demonstrate how suitable $s()$ functions can be derived for different assumptions and models.

[1]  Clayton Scott,et al.  A Rate of Convergence for Mixture Proportion Estimation, with Application to Learning from Noisy Labels , 2015, AISTATS.

[2]  Chris Mesterharm,et al.  On-line Learning with Delayed Label Feedback , 2005, ALT.

[3]  Boi Faltings,et al.  Incentives for Effort in Crowdsourcing Using the Peer Truth Serum , 2016, ACM Trans. Intell. Syst. Technol..

[4]  Yang Liu,et al.  Peer Loss Functions: Learning from Noisy Labels without Knowing Noise Rates , 2020, ICML.

[5]  David Haussler,et al.  How to use expert advice , 1993, STOC.

[6]  Yishay Mansour,et al.  Adversarial Online Learning with noise , 2018, ICML.

[7]  Mingyan Liu,et al.  An Online Learning Approach to Improving the Quality of Crowd-Sourcing , 2015, SIGMETRICS.

[8]  Anirban Dasgupta,et al.  Crowdsourced judgement elicitation with endogenous proficiency , 2013, WWW.

[9]  Boi Faltings,et al.  A Robust Bayesian Truth Serum for Non-Binary Signals , 2013, AAAI.

[10]  Paul Resnick,et al.  Eliciting Informative Feedback: The Peer-Prediction Method , 2005, Manag. Sci..

[11]  Philip E. Tetlock,et al.  Superforecasting: The Art and Science of Prediction , 2015 .

[12]  Sanjeev Arora,et al.  The Multiplicative Weights Update Method: a Meta-Algorithm and Applications , 2012, Theory Comput..

[13]  D. Prelec A Bayesian Truth Serum for Subjective Data , 2004, Science.

[14]  Manfred K. Warmuth,et al.  The Weighted Majority Algorithm , 1994, Inf. Comput..

[15]  Robert C. Williamson,et al.  Learning in the Presence of Corruption , 2015, ArXiv.

[16]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .

[17]  Nicolò Cesa-Bianchi,et al.  Nonstochastic Multiarmed Bandits with Unrestricted Delays , 2019, NeurIPS.

[18]  David C. Parkes,et al.  A Robust Bayesian Truth Serum for Small Populations , 2012, AAAI.

[19]  Mark D. Reid,et al.  Information, Divergence and Risk for Binary Experiments , 2009, J. Mach. Learn. Res..

[20]  D. Angluin,et al.  Learning From Noisy Examples , 1988, Machine Learning.

[21]  Cheng Soon Ong,et al.  Learning from Corrupted Binary Labels via Class-Probability Estimation , 2015, ICML.

[22]  Shie Mannor,et al.  From Bandits to Experts: On the Value of Side-Observations , 2011, NIPS.

[23]  Noga Alon,et al.  Online Learning with Feedback Graphs: Beyond Bandits , 2015, COLT.

[24]  Nicolò Cesa-Bianchi,et al.  Online Learning of Noisy Data , 2011, IEEE Transactions on Information Theory.

[25]  Arpit Agarwal,et al.  Informed Truthfulness in Multi-Task Peer Prediction , 2016, EC.

[26]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[27]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[28]  Nagarajan Natarajan,et al.  Learning with Noisy Labels , 2013, NIPS.

[29]  Grant Schoenebeck,et al.  An Information Theoretic Framework For Designing Information Elicitation Mechanisms That Reward Truth-telling , 2016, ACM Trans. Economics and Comput..

[30]  Yang Liu,et al.  Machine-Learning Aided Peer Prediction , 2017, EC.

[31]  András György,et al.  Online Learning under Delayed Feedback , 2013, ICML.

[32]  A. Raftery,et al.  Strictly Proper Scoring Rules, Prediction, and Estimation , 2007 .