Interactive Submodular Bandit

In many machine learning applications, submodular functions have been used as a model for evaluating the utility or payoff of a set such as news items to recommend, sensors to deploy in a terrain, nodes to influence in a social network, to name a few. At the heart of all these applications is the assumption that the underlying utility/payoff function is known a priori, hence maximizing it is in principle possible. In real life situations, however, the utility function is not fully known in advance and can only be estimated via interactions. For instance, whether a user likes a movie or not can be reliably evaluated only after it was shown to her. Or, the range of influence of a user in a social network can be estimated only after she is selected to advertise the product. We model such problems as an interactive submodular bandit optimization, where in each round we receive a context (e.g., previously selected movies) and have to choose an action (e.g., propose a new movie). We then receive a noisy feedback about the utility of the action (e.g., ratings) which we model as a submodular function over the context-action space. We develop SM-UCB that efficiently trades off exploration (collecting more data) and exploration (proposing a good action given gathered data) and achieves a $O(\sqrt{T})$ regret bound after $T$ rounds of interaction. Given a bounded-RKHS norm kernel over the context-action-payoff space that governs the smoothness of the utility function, SM-UCB keeps an upper-confidence bound on the payoff function that allows it to asymptotically achieve no-regret. Finally, we evaluate our results on four concrete applications, including movie recommendation (on the MovieLense data set), news recommendation (on Yahoo! Webscope dataset), interactive influence maximization (on a subset of the Facebook network), and personalized data summarization (on Reuters Corpus). In all these applications, we observe that SM-UCB consistently outperforms the prior art.

[1]  Morteza Zadimoghaddam,et al.  Fast Distributed Submodular Cover: Public-Private Data Summarization , 2016, NIPS.

[2]  Andreas Krause,et al.  Contextual Gaussian Process Bandit Optimization , 2011, NIPS.

[3]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[4]  Jeff A. Bilmes,et al.  Interactive Submodular Set Cover , 2010, ICML.

[5]  David A. Cohn,et al.  Active Learning with Statistical Models , 1996, NIPS.

[6]  Maria-Florina Balcan,et al.  Margin Based Active Learning , 2007, COLT.

[7]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[8]  Yisong Yue,et al.  Linear Submodular Bandits and their Application to Diversified Retrieval , 2011, NIPS.

[9]  Laurence A. Wolsey,et al.  Best Algorithms for Approximating the Maximum of a Submodular Set Function , 1978, Math. Oper. Res..

[10]  Amin Karbasi,et al.  Comparison-Based Learning with Rank Nets , 2012, ICML.

[11]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[12]  François Ingelrest,et al.  The hitchhiker's guide to successful wireless sensor network deployments , 2008, SenSys '08.

[13]  Wei Chu,et al.  Contextual Bandits with Linear Payoff Functions , 2011, AISTATS.

[14]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[15]  Andreas Krause,et al.  Adaptive Submodularity: Theory and Applications in Active Learning and Stochastic Optimization , 2010, J. Artif. Intell. Res..

[16]  Sanjoy Dasgupta,et al.  A General Agnostic Active Learning Algorithm , 2007, ISAIM.

[17]  Andreas Krause,et al.  Submodular Function Maximization , 2014, Tractability.

[18]  Reynold Cheng,et al.  Online Influence Maximization , 2015, KDD.

[19]  Amin Karbasi,et al.  Near-Optimal Active Learning of Halfspaces via Query Synthesis in the Noisy Setting , 2016, AAAI.

[20]  C. Guestrin,et al.  Near-optimal sensor placements: maximizing information while minimizing communication cost , 2006, 2006 5th International Conference on Information Processing in Sensor Networks.

[21]  Shai Ben-David,et al.  Clustering with Same-Cluster Queries , 2016, NIPS.

[22]  Christos H. Papadimitriou,et al.  Locally Adaptive Optimization: Adaptive Seeding for Monotone Submodular Functions , 2016, SODA.

[23]  Avinatan Hassidim,et al.  Submodular Optimization under Noise , 2016, COLT.

[24]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[25]  Seref Sagiroglu,et al.  The development of intuitive knowledge classifier and the modeling of domain dependent data , 2013, Knowl. Based Syst..

[26]  Martin Pál,et al.  Contextual Multi-Armed Bandits , 2010, AISTATS.

[27]  Sébastien Bubeck,et al.  Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[28]  Wei Chen,et al.  Stochastic Online Greedy Learning with Semi-bandit Feedbacks , 2015, NIPS.

[29]  Rémi Munos,et al.  Algorithms for Infinitely Many-Armed Bandits , 2008, NIPS.

[30]  Wai Lam,et al.  An active learning framework for semi-supervised document clustering with language modeling , 2009, Data Knowl. Eng..

[31]  Matthew J. Streeter,et al.  An Online Algorithm for Maximizing Submodular Functions , 2008, NIPS.

[32]  Stanford,et al.  Learning to Discover Social Circles in Ego Networks , 2012 .

[33]  Yisong Yue,et al.  Knapsack Constrained Contextual Submodular List Prediction with Application to Multi-document Summarization , 2013, ArXiv.

[34]  Maria-Florina Balcan,et al.  Local algorithms for interactive clustering , 2013, ICML.

[35]  John Langford,et al.  Agnostic active learning , 2006, J. Comput. Syst. Sci..

[36]  Kaigui Bian,et al.  Influence Maximization in Messenger-Based Social Networks , 2016, 2016 IEEE Global Communications Conference (GLOBECOM).

[37]  Thomas Gärtner,et al.  Kernels for Structured Data , 2008, Series in Machine Perception and Artificial Intelligence.

[38]  Yajun Wang,et al.  Combinatorial Multi-Armed Bandit and Its Extension to Probabilistically Triggered Arms , 2014, J. Mach. Learn. Res..

[39]  Pushmeet Kohli,et al.  Tractability: Practical Approaches to Hard Problems , 2013 .

[40]  Alex Smola,et al.  Kernel methods in machine learning , 2007, math/0701907.

[41]  Andreas Krause,et al.  Near-Optimal Sensor Placements in Gaussian Processes: Theory, Efficient Algorithms and Empirical Studies , 2008, J. Mach. Learn. Res..

[42]  Le Song,et al.  Influence Estimation and Maximization in Continuous-Time Diffusion Networks , 2016, ACM Trans. Inf. Syst..

[43]  Christopher D. Manning,et al.  Learning Language Games through Interaction , 2016, ACL.

[44]  Baharan Mirzasoleiman,et al.  Fast Constrained Submodular Maximization: Personalized Data Summarization , 2016, ICML.

[45]  Sanjoy Dasgupta,et al.  Analysis of a greedy active learning strategy , 2004, NIPS.

[46]  Éva Tardos,et al.  Maximizing the Spread of Influence through a Social Network , 2015, Theory Comput..

[47]  Andreas Krause,et al.  Discovering Valuable items from Massive Data , 2015, KDD.

[48]  Lior Seeman,et al.  Adaptive Seeding in Social Networks , 2013, 2013 IEEE 54th Annual Symposium on Foundations of Computer Science.

[49]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[50]  Dafna Shahaf,et al.  Turning down the noise in the blogosphere , 2009, KDD.

[51]  Andreas Krause,et al.  Near Optimal Bayesian Active Learning for Decision Making , 2014, AISTATS.