论文信息 - Online Semi-Supervised Learning with Bandit Feedback

Online Semi-Supervised Learning with Bandit Feedback

We formulate a new problem at the intersectionof semi-supervised learning and contextual bandits,motivated by several applications including clini-cal trials and ad recommendations. We demonstratehow Graph Convolutional Network (GCN), a semi-supervised learning approach, can be adjusted tothe new problem formulation. We also propose avariant of the linear contextual bandit with semi-supervised missing rewards imputation. We thentake the best of both approaches to develop multi-GCN embedded contextual bandit. Our algorithmsare verified on several real world datasets.

[1] Matthias Seeger,et al. Learning from Labeled and Unlabeled Data , 2010, Encyclopedia of Machine Learning.

[2] Xavier Bresson,et al. Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering , 2016, NIPS.

[3] Csaba Szepesvári,et al. Partial Monitoring - Classification, Regret Bounds, and Algorithms , 2014, Math. Oper. Res..

[4] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[5] Xiaojin Zhu,et al. Semi-Supervised Learning , 2010, Encyclopedia of Machine Learning.

[6] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.

[7] Djallel Bouneffouf,et al. Bandit Models of Human Behavior: Reward Processing in Mental Disorders , 2017, AGI.

[8] Pascal Frossard,et al. The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains , 2012, IEEE Signal Processing Magazine.

[9] Wei Chu,et al. Contextual Bandits with Linear Payoff Functions , 2011, AISTATS.

[10] Joan Bruna,et al. Deep Convolutional Networks on Graph-Structured Data , 2015, ArXiv.

[11] Max Welling,et al. Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[12] Mikhail Belkin,et al. Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[13] Peter Tino,et al. IEEE Transactions on Neural Networks , 2009 .

[14] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[15] Pierre Vandergheynst,et al. Geometric Deep Learning: Going beyond Euclidean data , 2016, IEEE Signal Process. Mag..

[16] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .

[17] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[18] B. Yver,et al. Online semi-supervised learning: Application to dynamic learning from RADAR data , 2009, 2009 International Radar Conference "Surveillance for a Safer World" (RADAR 2009).

[19] Pamela F. Jones,et al. Computational and Mathematical Methods in Medicine , 2011, Comput. Math. Methods Medicine.

[20] E. Kaufmann. Corrupt Bandits , 2016 .

[21] Raphaël Féraud,et al. A Neural Networks Committee for the Contextual Bandit Problem , 2014, ICONIP.

[22] Michael I. Jordan,et al. Advances in Neural Information Processing Systems 30 , 1995 .

[23] Raphaël Féraud,et al. Context Attentive Bandits: Contextual Bandit with Restricted Context , 2017, IJCAI.

[24] Djallel Bouneffouf,et al. Contextual Bandit with Adaptive Feature Extraction , 2018, 2018 IEEE International Conference on Data Mining Workshops (ICDMW).

[25] Juliana Freire,et al. Proceedings of the 19th international conference on World wide web , 2010, WWW 2010.

[26] Djallel Bouneffouf,et al. Unified Models of Human Behavioral Agents in Bandits, Contextual Bandits and RL , 2020, ArXiv.

[27] Djallel Bouneffouf. Online learning with Corrupted context: Corrupted Contextual Bandits , 2020, ArXiv.

[28] Djallel Bouneffouf,et al. Optimal Exploitation of Clustering and History Information in Multi-Armed Bandit , 2019, IJCAI.

[29] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[30] Shipra Agrawal,et al. Thompson Sampling for Contextual Bandits with Linear Payoffs , 2012, ICML.

[31] John Langford,et al. The Epoch-Greedy Algorithm for Multi-armed Bandits with Side Information , 2007, NIPS.

[32] Francesca Rossi,et al. Using Contextual Bandits with Behavioral Constraints for Constrained Online Movie Recommendation , 2018, IJCAI.

[33] Zoubin Ghahramani,et al. Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[34] Ling Huang,et al. Online Semi-Supervised Learning on Quantized Graphs , 2010, UAI.

[35] Wei Chu,et al. A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.