Efficient Interactive Multiclass Learning from Binary Feedback

We introduce a novel algorithm called upper confidence-weighted learning (UCWL) for online multiclass learning from binary feedback (e.g., feedback that indicates whether the prediction was right or wrong). UCWL combines the upper confidence bound (UCB) framework with the soft confidence-weighted (SCW) online learning scheme. In UCB, each instance is classified using both score and uncertainty. For a given instance in the sequence, the algorithm might guess its class label primarily to reduce the class uncertainty. This is a form of informed exploration, which enables the performance to improve with lower sample complexity compared to the case without exploration. Combining UCB with SCW leads to the ability to deal well with noisy and nonseparable data, and state-of-the-art performance is achieved without increasing the computational cost. A potential application setting is human-robot interaction (HRI), where the robot is learning to classify some set of inputs while the human teaches it by providing only binary feedback—or sometimes even the wrong answer entirely. Experimental results in the HRI setting and with two benchmark datasets from other settings show that UCWL outperforms other state-of-the-art algorithms in the online binary feedback setting—and surprisingly even sometimes outperforms state-of-the-art algorithms that get full feedback (e.g., the true class label), whereas UCWL gets only binary feedback on the same data sequence.

[1]  Koby Crammer,et al.  Exact Convex Confidence-Weighted Learning , 2008, NIPS.

[2]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[3]  Wolfgang Ertel,et al.  Reinforcement learning combined with human feedback in continuous state and action spaces , 2012, 2012 IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL).

[4]  Aram Kawewong,et al.  Online incremental attribute-based zero-shot learning , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Peter Stone,et al.  Reinforcement learning from simultaneous human and MDP reward , 2012, AAMAS.

[6]  Léon Bottou,et al.  The Tradeoffs of Large Scale Learning , 2007, NIPS.

[7]  Bogdan Raducanu,et al.  Online Learning for Human-Robot Interaction , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Heiko Wersing,et al.  Recent trends in online learning for cognitive robots , 2006, ESANN.

[9]  Claudio Gentile,et al.  Robust bounds for classification via selective sampling , 2009, ICML '09.

[10]  V. Vovk Competitive On‐line Statistics , 2001 .

[11]  Claudio Gentile,et al.  A Second-Order Perceptron Algorithm , 2002, SIAM J. Comput..

[12]  Yann LeCun,et al.  Large Scale Online Learning , 2003, NIPS.

[13]  Jürgen Schmidhuber,et al.  Learning skills from play: Artificial curiosity on a Katana robot arm , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[14]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[15]  Luca Maria Gambardella,et al.  Max-pooling convolutional neural networks for vision-based hand gesture recognition , 2011, 2011 IEEE International Conference on Signal and Image Processing Applications (ICSIPA).

[16]  Manfred K. Warmuth,et al.  Relative Loss Bounds for On-Line Density Estimation with the Exponential Family of Distributions , 1999, Machine Learning.

[17]  C. Harshith,et al.  Survey on Various Gesture Recognition Techniques for Interfacing Machines Based on Ambient Intelligence , 2010, ArXiv.

[18]  Kunihiko Fukushima,et al.  Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[19]  Noga Alon,et al.  The Probabilistic Method , 2015, Fundamentals of Ramsey Theory.

[20]  Luca Maria Gambardella,et al.  Convolutional Neural Network Committees for Handwritten Character Classification , 2011, 2011 International Conference on Document Analysis and Recognition.

[21]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[22]  Steven C. H. Hoi,et al.  Exact Soft Confidence-Weighted Learning , 2012, ICML.

[23]  Ambuj Tewari,et al.  Efficient bandit algorithms for online multiclass prediction , 2008, ICML '08.

[24]  Koby Crammer,et al.  Multiclass classification with bandit feedback using adaptive regularization , 2012, Machine Learning.

[25]  Claudio Gentile,et al.  Incremental Algorithms for Hierarchical Classification , 2004, J. Mach. Learn. Res..

[26]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[27]  Ryan M. Rifkin,et al.  In Defense of One-Vs-All Classification , 2004, J. Mach. Learn. Res..

[28]  Luis Enrique Sucar,et al.  Real-time face recognition for human-robot interaction , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[29]  Bharti Bansal,et al.  Gesture Recognition: A Survey , 2016 .

[30]  Koby Crammer,et al.  Confidence-weighted linear classification , 2008, ICML '08.

[31]  N. Littlestone Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[32]  Jürgen Schmidhuber,et al.  Upper Confidence Weighted Learning for Efficient Exploration in Multiclass Prediction with Binary Feedback , 2013, IJCAI.

[33]  Sven Behnke,et al.  Evaluation of Pooling Operations in Convolutional Architectures for Object Recognition , 2010, ICANN.

[34]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[35]  Jannik Fritsch,et al.  Interactive object learning for robot companions using mosaic images , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[36]  W. B. Knox Augmenting Reinforcement Learning with Human Feedback , 2011 .

[37]  Csaba Szepesvári,et al.  Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[38]  Ronen I. Brafman,et al.  R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..

[39]  Koby Crammer,et al.  Adaptive regularization of weight vectors , 2009, Machine Learning.

[40]  Koby Crammer,et al.  Confidence-Weighted Linear Classification for Text Categorization , 2012, J. Mach. Learn. Res..

[41]  Luca Maria Gambardella,et al.  Flexible, High Performance Convolutional Neural Networks for Image Classification , 2011, IJCAI.

[42]  T. Poggio,et al.  Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[43]  Marc'Aurelio Ranzato,et al.  Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[44]  Aram Kawewong,et al.  Fast online incremental transfer learning for unseen object classification using self-organizing incremental neural networks , 2011, The 2011 International Joint Conference on Neural Networks.

[45]  Thomas G. Dietterich,et al.  Reinforcement Learning Via Practice and Critique Advice , 2010, AAAI.

[46]  Léon Bottou,et al.  On-line learning and stochastic approximations , 1999 .

[47]  Michael A. Goodrich,et al.  Human-Robot Interaction: A Survey , 2008, Found. Trends Hum. Comput. Interact..

[48]  Peter Stone,et al.  Training a Tetris agent via interactive shaping: a demonstration of the TAMER framework , 2010, AAMAS.

[49]  Peter Stone,et al.  Interactively shaping agents via human reinforcement: the TAMER framework , 2009, K-CAP '09.

[50]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[51]  Peter Stone,et al.  Reinforcement Learning with Human Feedback in Mountain Car , 2011, AAAI Spring Symposium: Help Me Help You: Bridging the Gaps in Human-Agent Collaboration.

[52]  TaeChoong Chung,et al.  Learning via human feedback in continuous state and action spaces , 2013, Applied Intelligence.

[53]  Koby Crammer,et al.  Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[54]  Kazuoki Azuma WEIGHTED SUMS OF CERTAIN DEPENDENT RANDOM VARIABLES , 1967 .

[55]  Ankit Chaudhary,et al.  Intelligent Approaches to interact with Machines using Hand Gesture Recognition in Natural way: A Survey , 2011, ArXiv.

[56]  Yael Edan,et al.  Vision-based hand-gesture applications , 2011, Commun. ACM.

[57]  Jürgen Schmidhuber,et al.  Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990–2010) , 2010, IEEE Transactions on Autonomous Mental Development.

[58]  Ingo Steinwart,et al.  On the Influence of the Kernel on the Consistency of Support Vector Machines , 2002, J. Mach. Learn. Res..

[59]  Francesco Orabona,et al.  Better Algorithms for Selective Sampling , 2011, ICML.

[60]  Peter Auer,et al.  Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[61]  Luca Maria Gambardella,et al.  Incremental learning using partial feedback for gesture-based human-swarm interaction , 2012, 2012 IEEE RO-MAN: The 21st IEEE International Symposium on Robot and Human Interactive Communication.

[62]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[63]  Peter Stone,et al.  Combining manual feedback with subsequent MDP reward signals for reinforcement learning , 2010, AAMAS.

[64]  Jürgen Schmidhuber,et al.  Confidence-based progress-driven self-generated goals for skill acquisition in developmental robots , 2013, Front. Psychol..

[65]  Koby Crammer,et al.  Multi-Class Confidence Weighted Algorithms , 2009, EMNLP.