Max-Utility Based Arm Selection Strategy For Sequential Query Recommendations

We consider the query recommendation problem in closed loop interactive learning settings like online information gathering and exploratory analytics. The problem can be naturally modelled using the Multi-Armed Bandits (MAB) framework with countably many arms. The standard MAB algorithms for countably many arms begin with selecting a random set of candidate arms and then applying standard MAB algorithms, e.g., UCB, on this candidate set downstream. We show that such a selection strategy often results in higher cumulative regret and to this end, we propose a selection strategy based on the maximum utility of the arms. We show that in tasks like online information gathering, where sequential query recommendations are employed, the sequences of queries are correlated and the number of potentially optimal queries can be reduced to a manageable size by selecting queries with maximum utility with respect to the currently executing query. Our experimental results using a recent real online literature discovery service log file demonstrate that the proposed arm selection strategy improves the cumulative regret substantially with respect to the state-of-the-art baseline algorithms. Our data model and source code are available at https://anonymous.4open.science/r/0e5ad6b7-ac02-4577-9212-c9d505d3dbdb/.

[1]  Rémi Munos,et al.  Algorithms for Infinitely Many-Armed Bandits , 2008, NIPS.

[2]  Sheldon M. Ross,et al.  Policies without Memory for the Infinite-Armed Bernoulli Bandit under the Average-Reward Criterion , 1996 .

[3]  Khashayar Khosravi,et al.  Unreasonable Effectiveness of Greedy Algorithms in Multi-Armed Bandit with Many Arms , 2020, NeurIPS.

[4]  Daniel A. Braun,et al.  A conversion between utility and information , 2009, AGI 2010.

[5]  Assaf J. Zeevi,et al.  From Finite to Countable-Armed Bandits , 2021, NeurIPS.

[6]  Eli Upfal,et al.  Bandits and Experts in Metric Spaces , 2013, J. ACM.

[7]  Alexandre Proutière,et al.  Lipschitz Bandits: Regret Lower Bound and Optimal Algorithms , 2014, COLT.

[8]  Zhiyuan Liu,et al.  Query Suggestion with Feedback Memory Network , 2018, WWW.

[9]  Patrick Marcel,et al.  A survey of query recommendation techniques for data warehouse exploration , 2011, EDA.

[10]  Iryna Gurevych,et al.  Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks , 2019, EMNLP.

[11]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[12]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[13]  Olga Papaemmanouil,et al.  Explore-by-example: an automatic query steering framework for interactive data exploration , 2014, SIGMOD Conference.

[14]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[15]  Wei Wang,et al.  Click Feedback-Aware Query Recommendation Using Adversarial Examples , 2019, WWW.

[16]  Robert W. Chen,et al.  Bandit problems with infinitely many arms , 1997 .

[17]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[18]  Enrique Alfonseca,et al.  Learning to Attend, Copy, and Generate for Session-Based Query Suggestion , 2017, CIKM.

[19]  M. L. Fisher,et al.  An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..

[20]  Robert Nowak,et al.  On Regret with Multiple Best Arms , 2020, NeurIPS.

[21]  Peter Triantafillou,et al.  Large-scale Data Exploration Using Explanatory Regression Functions , 2020, ACM Trans. Knowl. Discov. Data.

[22]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[23]  Andreas Krause,et al.  Submodularity and its applications in optimized information gathering , 2011, TIST.

[24]  Andreas Krause,et al.  Distributed Submodular Maximization , 2014, J. Mach. Learn. Res..

[25]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[26]  Paul N. Bennett,et al.  Leading Conversational Search by Suggesting Useful Questions , 2020, WWW.

[27]  Qi He,et al.  Web Query Recommendation via Sequential Query Prediction , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[28]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[29]  Shipra Agrawal,et al.  Thompson Sampling for Contextual Bandits with Linear Payoffs , 2012, ICML.

[30]  Csaba Szepesvari,et al.  Bandit Algorithms , 2020 .

[31]  Ricardo A. Baeza-Yates,et al.  Query Recommendation Using Query Logs in Search Engines , 2004, EDBT Workshops.

[32]  Wei Wang,et al.  RIN: Reformulation Inference Network for Context-Aware Query Suggestion , 2018, CIKM.