Finding the Bandit in a Graph: Sequential Search-and-Stop

We consider the problem where an agent wants to find a hidden object that is randomly located in some vertex of a directed acyclic graph (DAG) according to a fixed but possibly unknown distribution. The agent can only examine vertices whose in-neighbors have already been examined. In this paper, we address a learning setting where we allow the agent to stop before having found the object and restart searching on a new independent instance of the same problem. Our goal is to maximize the total number of hidden objects found given a time budget. The agent can thus skip an instance after realizing that it would spend too much time on it. Our contributions are both to the search theory and multi-armed bandits. If the distribution is known, we provide a quasi-optimal and efficient stationary strategy. If the distribution is unknown, we additionally show how to sequentially approximate it and, at the same time, act near-optimally in order to collect as many hidden objects as possible.

[1]  Shmuel Gal,et al.  The theory of search games and rendezvous , 2002, International series in operations research and management science.

[2]  Alexandre Proutière,et al.  Combinatorial Bandits Revisited , 2015, NIPS.

[3]  Jeffrey B. Sidney,et al.  Decomposition Algorithms for Single-Machine Sequencing with Precedence Relations and Deferral Costs , 1975, Oper. Res..

[4]  Yajun Wang,et al.  Combinatorial Multi-Armed Bandit and Its Extension to Probabilistically Triggered Arms , 2014, J. Mach. Learn. Res..

[5]  Tao Qin,et al.  Multi-Armed Bandit with Budget Constraint and Variable Costs , 2013, AAAI.

[6]  Nicholas R. Jennings,et al.  Efficient crowdsourcing of unknown experts using bounded multi-armed bandits , 2014, Artif. Intell..

[7]  Shipra Agrawal,et al.  Analysis of Thompson Sampling for the Multi-armed Bandit Problem , 2011, COLT.

[8]  Alexandre B. Tsybakov,et al.  Introduction to Nonparametric Estimation , 2008, Springer series in statistics.

[9]  Anupam Gupta,et al.  A Stochastic Probing Problem with Applications , 2013, IPCO.

[10]  Lihong Li,et al.  An Empirical Evaluation of Thompson Sampling , 2011, NIPS.

[11]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[12]  Steve Alpern,et al.  Mining Coal or Finding Terrorists: The Expanding Search Paradigm , 2013, Oper. Res..

[13]  Monaldo Mastrolilli,et al.  Single Machine Precedence Constrained Scheduling Is a Vertex Cover Problem , 2009, Algorithmica.

[14]  Michal Valko,et al.  Bandits on Graphs and Structures , 2016 .

[15]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[16]  Jan Karel Lenstra,et al.  Complexity of Scheduling under Precedence Constraints , 1978, Oper. Res..

[17]  J. Hemelrijk,et al.  Underlining random variables , 1966 .

[18]  Steven R. Bishop,et al.  Static search games played over graphs and general metric spaces , 2013, Eur. J. Oper. Res..

[19]  Bhaskar Krishnamachari,et al.  Combinatorial Network Optimization With Unknown Variables: Multi-Armed Bandits With Linear Rewards and Individual Observations , 2010, IEEE/ACM Transactions on Networking.

[20]  Patrick Jaillet,et al.  Logarithmic regret bounds for Bandits with Knapsacks , 2015, 1510.01800.

[21]  Aleksandrs Slivkins,et al.  Combinatorial Semi-Bandits with Knapsacks , 2017, AISTATS.

[22]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[23]  Jirí Vomlel,et al.  The SACSO methodology for troubleshooting complex systems , 2001, Artificial Intelligence for Engineering Design, Analysis and Manufacturing.

[24]  Odile Bellenguez-Morineau,et al.  A survey on how the structure of precedence constraints may change the complexity class of scheduling problems , 2017, J. Sched..

[25]  Archie C. Chapman,et al.  Knapsack Based Optimal Policies for Budget-Limited Multi-Armed Bandits , 2012, AAAI.

[26]  Vianney Perchet,et al.  Combinatorial semi-bandit with known covariance , 2016, NIPS.

[27]  Csaba Szepesvári,et al.  Exploration-exploitation tradeoff using variance estimates in multi-armed bandits , 2009, Theor. Comput. Sci..

[28]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[29]  David Heckerman,et al.  Decision-theoretic troubleshooting , 1995, CACM.

[30]  Ryusuke Hohzaki,et al.  SEARCH GAMES : LITERATURE AND SURVEY , 2016 .

[31]  Shie Mannor,et al.  Thompson Sampling for Complex Online Problems , 2013, ICML.

[32]  Henry R. Richardson,et al.  Search Theory , 2013, Springer New York.

[33]  E. Lawler Sequencing Jobs to Minimize Total Weighted Completion Time Subject to Precedence Constraints , 1978 .

[34]  Aleksandrs Slivkins,et al.  Bandits with Knapsacks , 2013, 2013 IEEE 54th Annual Symposium on Foundations of Computer Science.

[35]  Wtt Wtt Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits , 2015 .

[36]  Wayne E. Smith Various optimizers for single‐stage production , 1956 .

[37]  Ola Svensson,et al.  On the Approximability of Single-Machine Scheduling with Precedence Constraints , 2011, Math. Oper. Res..

[38]  Hiroshi Nakagawa,et al.  Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple Plays , 2015, ICML.

[39]  Nenghai Yu,et al.  Budgeted Bandit Problems with Continuous Random Costs , 2015, ACML.

[40]  Wei Chen,et al.  Thompson Sampling for Combinatorial Semi-Bandits , 2018, ICML.

[41]  Robbert Fokkink,et al.  On Submodular Search and Machine Scheduling , 2016, Math. Oper. Res..

[42]  Wei Chen,et al.  Improving Regret Bounds for Combinatorial Semi-Bandits with Probabilistically Triggered Arms and Its Applications , 2017, NIPS.

[43]  E.L. Lawler,et al.  Optimization and Approximation in Deterministic Sequencing and Scheduling: a Survey , 1977 .

[44]  Shmuel Gal On the optimality of a simple strategy for searching graphs , 2001, Int. J. Game Theory.

[45]  Nenghai Yu,et al.  Budgeted Multi-Armed Bandits with Multiple Plays , 2016, IJCAI.

[46]  Damien Ernst,et al.  Optimal discovery with probabilistic expert advice: finite time analysis and macroscopic optimality , 2012, J. Mach. Learn. Res..

[47]  Junpei Komiyama,et al.  KL-UCB-Based Policy for Budgeted Multi-Armed Bandits with Stochastic Action Costs , 2017, IEICE Trans. Fundam. Electron. Commun. Comput. Sci..

[48]  Kazuoki Azuma WEIGHTED SUMS OF CERTAIN DEPENDENT RANDOM VARIABLES , 1967 .

[49]  Nicolò Cesa-Bianchi,et al.  Combinatorial Bandits , 2012, COLT.

[50]  L. Stone Theory of Optimal Search , 1975 .

[51]  Václav Lín,et al.  Scheduling results applicable to decision-theoretic troubleshooting , 2015, Int. J. Approx. Reason..

[52]  Mehryar Mohri,et al.  Optimal Regret Minimization in Posted-Price Auctions with Strategic Buyers , 2014, NIPS.

[53]  William H. Ruckle,et al.  Initial Point Search on Weighted Trees , 1994 .