Upper Confidence Weighted Learning for Efficient Exploration in Multiclass Prediction with Binary Feedback

We introduce a novel algorithm called Upper Confidence Weighted Learning (UCWL) for online multiclass learning from binary feedback. UCWL combines the Upper Confidence Bound (UCB) framework with the Soft Confidence Weighted (SCW) online learning scheme. UCWL achieves state of the art performance (especially on noisy and nonseparable data) with low computational costs. Estimated confidence intervals are used for informed exploration, which enables faster learning than the uninformed exploration case or the case where exploration is not used. The targeted application setting is human-robot interaction (HRI), in which a robot is learning to classify its observations while a human teaches it by providing only binary feedback (e.g., right/wrong). Results in an HRI experiment, and with two benchmark datasets, show UCWL outperforms other algorithms in the online binary feedback setting, and surprisingly even sometimes beats state-of-the-art algorithms that get full feedback, while UCWL gets only binary feedback on the same data.

[1]  Koby Crammer,et al.  Exact Convex Confidence-Weighted Learning , 2008, NIPS.

[2]  V. Vovk Competitive On‐line Statistics , 2001 .

[3]  Jürgen Schmidhuber,et al.  Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990–2010) , 2010, IEEE Transactions on Autonomous Mental Development.

[4]  Csaba Szepesvári,et al.  Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[5]  Koby Crammer,et al.  Multi-Class Confidence Weighted Algorithms , 2009, EMNLP.

[6]  Ambuj Tewari,et al.  Efficient bandit algorithms for online multiclass prediction , 2008, ICML '08.

[7]  Koby Crammer,et al.  Multiclass classification with bandit feedback using adaptive regularization , 2012, Machine Learning.

[8]  Luca Maria Gambardella,et al.  Cooperative sensing and recognition by a swarm of mobile robots , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[9]  Steven C. H. Hoi,et al.  Exact Soft Confidence-Weighted Learning , 2012, ICML.

[10]  Peter Auer,et al.  Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[11]  Koby Crammer,et al.  New Adaptive Algorithms for Online Classification , 2010, NIPS.

[12]  Koby Crammer,et al.  Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[13]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[14]  Manfred K. Warmuth,et al.  Relative Loss Bounds for On-Line Density Estimation with the Exponential Family of Distributions , 1999, Machine Learning.

[15]  Jürgen Schmidhuber,et al.  Formal Theory of Fun and Creativity , 2010, ECML/PKDD.

[16]  Koby Crammer,et al.  Adaptive regularization of weight vectors , 2009, Machine Learning.

[17]  F. R. Rosendaal,et al.  Prediction , 2015, Journal of thrombosis and haemostasis : JTH.

[18]  Claudio Gentile,et al.  A Second-Order Perceptron Algorithm , 2002, SIAM J. Comput..