Online Semi-Supervised Learning with Bandit Feedback

We formulate a new problem at the intersectionof semi-supervised learning and contextual bandits,motivated by several applications including clini-cal trials and ad recommendations. We demonstratehow Graph Convolutional Network (GCN), a semi-supervised learning approach, can be adjusted tothe new problem formulation. We also propose avariant of the linear contextual bandit with semi-supervised missing rewards imputation. We thentake the best of both approaches to develop multi-GCN embedded contextual bandit. Our algorithmsare verified on several real world datasets.

[1]  Matthias Seeger,et al.  Learning from Labeled and Unlabeled Data , 2010, Encyclopedia of Machine Learning.

[2]  Xavier Bresson,et al.  Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering , 2016, NIPS.

[3]  Csaba Szepesvári,et al.  Partial Monitoring - Classification, Regret Bounds, and Algorithms , 2014, Math. Oper. Res..

[4]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[5]  Xiaojin Zhu,et al.  Semi-Supervised Learning , 2010, Encyclopedia of Machine Learning.

[6]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[7]  Djallel Bouneffouf,et al.  Bandit Models of Human Behavior: Reward Processing in Mental Disorders , 2017, AGI.

[8]  Pascal Frossard,et al.  The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains , 2012, IEEE Signal Processing Magazine.

[9]  Wei Chu,et al.  Contextual Bandits with Linear Payoff Functions , 2011, AISTATS.

[10]  Joan Bruna,et al.  Deep Convolutional Networks on Graph-Structured Data , 2015, ArXiv.

[11]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[12]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[13]  Peter Tino,et al.  IEEE Transactions on Neural Networks , 2009 .

[14]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[15]  Pierre Vandergheynst,et al.  Geometric Deep Learning: Going beyond Euclidean data , 2016, IEEE Signal Process. Mag..

[16]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[17]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[18]  B. Yver,et al.  Online semi-supervised learning: Application to dynamic learning from RADAR data , 2009, 2009 International Radar Conference "Surveillance for a Safer World" (RADAR 2009).

[19]  Pamela F. Jones,et al.  Computational and Mathematical Methods in Medicine , 2011, Comput. Math. Methods Medicine.

[20]  E. Kaufmann Corrupt Bandits , 2016 .

[21]  Raphaël Féraud,et al.  A Neural Networks Committee for the Contextual Bandit Problem , 2014, ICONIP.

[22]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[23]  Raphaël Féraud,et al.  Context Attentive Bandits: Contextual Bandit with Restricted Context , 2017, IJCAI.

[24]  Djallel Bouneffouf,et al.  Contextual Bandit with Adaptive Feature Extraction , 2018, 2018 IEEE International Conference on Data Mining Workshops (ICDMW).

[25]  Juliana Freire,et al.  Proceedings of the 19th international conference on World wide web , 2010, WWW 2010.

[26]  Djallel Bouneffouf,et al.  Unified Models of Human Behavioral Agents in Bandits, Contextual Bandits and RL , 2020, ArXiv.

[27]  Djallel Bouneffouf Online learning with Corrupted context: Corrupted Contextual Bandits , 2020, ArXiv.

[28]  Djallel Bouneffouf,et al.  Optimal Exploitation of Clustering and History Information in Multi-Armed Bandit , 2019, IJCAI.

[29]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[30]  Shipra Agrawal,et al.  Thompson Sampling for Contextual Bandits with Linear Payoffs , 2012, ICML.

[31]  John Langford,et al.  The Epoch-Greedy Algorithm for Multi-armed Bandits with Side Information , 2007, NIPS.

[32]  Francesca Rossi,et al.  Using Contextual Bandits with Behavioral Constraints for Constrained Online Movie Recommendation , 2018, IJCAI.

[33]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[34]  Ling Huang,et al.  Online Semi-Supervised Learning on Quantized Graphs , 2010, UAI.

[35]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.