论文信息 - Finding the Bandit in a Graph: Sequential Search-and-Stop - 字舞流文

Finding the Bandit in a Graph: Sequential Search-and-Stop

We consider the problem where an agent wants to find a hidden object that is randomly located in some vertex of a directed acyclic graph (DAG) according to a fixed but possibly unknown distribution. The agent can only examine vertices whose in-neighbors have already been examined. In this paper, we address a learning setting where we allow the agent to stop before having found the object and restart searching on a new independent instance of the same problem. Our goal is to maximize the total number of hidden objects found given a time budget. The agent can thus skip an instance after realizing that it would spend too much time on it. Our contributions are both to the search theory and multi-armed bandits. If the distribution is known, we provide a quasi-optimal and efficient stationary strategy. If the distribution is unknown, we additionally show how to sequentially approximate it and, at the same time, act near-optimally in order to collect as many hidden objects as possible.

Vianney Perchet | Michal Valko | Pierre Perrault | Vianney Perchet | Michal Valko | Pierre Perrault

[1] Shmuel Gal,et al. The theory of search games and rendezvous , 2002, International series in operations research and management science.

[2] Alexandre Proutière,et al. Combinatorial Bandits Revisited , 2015, NIPS.

[3] Jeffrey B. Sidney,et al. Decomposition Algorithms for Single-Machine Sequencing with Precedence Relations and Deferral Costs , 1975, Oper. Res..

[4] Yajun Wang,et al. Combinatorial Multi-Armed Bandit and Its Extension to Probabilistically Triggered Arms , 2014, J. Mach. Learn. Res..

[5] Tao Qin,et al. Multi-Armed Bandit with Budget Constraint and Variable Costs , 2013, AAAI.

[6] Nicholas R. Jennings,et al. Efficient crowdsourcing of unknown experts using bounded multi-armed bandits , 2014, Artif. Intell..

[7] Shipra Agrawal,et al. Analysis of Thompson Sampling for the Multi-armed Bandit Problem , 2011, COLT.

[8] Alexandre B. Tsybakov,et al. Introduction to Nonparametric Estimation , 2008, Springer series in statistics.

[9] Anupam Gupta,et al. A Stochastic Probing Problem with Applications , 2013, IPCO.

[10] Lihong Li,et al. An Empirical Evaluation of Thompson Sampling , 2011, NIPS.

[11] W. Hoeffding. Probability Inequalities for sums of Bounded Random Variables , 1963 .

[12] Steve Alpern,et al. Mining Coal or Finding Terrorists: The Expanding Search Paradigm , 2013, Oper. Res..

[13] Monaldo Mastrolilli,et al. Single Machine Precedence Constrained Scheduling Is a Vertex Cover Problem , 2009, Algorithmica.

[14] Michal Valko,et al. Bandits on Graphs and Structures , 2016 .

[15] Wei Chu,et al. A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[16] Jan Karel Lenstra,et al. Complexity of Scheduling under Precedence Constraints , 1978, Oper. Res..

[17] J. Hemelrijk,et al. Underlining random variables , 1966 .

[18] Steven R. Bishop,et al. Static search games played over graphs and general metric spaces , 2013, Eur. J. Oper. Res..

[19] Bhaskar Krishnamachari,et al. Combinatorial Network Optimization With Unknown Variables: Multi-Armed Bandits With Linear Rewards and Individual Observations , 2010, IEEE/ACM Transactions on Networking.

[20] Patrick Jaillet,et al. Logarithmic regret bounds for Bandits with Knapsacks , 2015, 1510.01800.

[21] Aleksandrs Slivkins,et al. Combinatorial Semi-Bandits with Knapsacks , 2017, AISTATS.

[22] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[23] Jirí Vomlel,et al. The SACSO methodology for troubleshooting complex systems , 2001, Artificial Intelligence for Engineering Design, Analysis and Manufacturing.

[24] Odile Bellenguez-Morineau,et al. A survey on how the structure of precedence constraints may change the complexity class of scheduling problems , 2017, J. Sched..

[25] Archie C. Chapman,et al. Knapsack Based Optimal Policies for Budget-Limited Multi-Armed Bandits , 2012, AAAI.

[26] Vianney Perchet,et al. Combinatorial semi-bandit with known covariance , 2016, NIPS.

[27] Csaba Szepesvári,et al. Exploration-exploitation tradeoff using variance estimates in multi-armed bandits , 2009, Theor. Comput. Sci..

[28] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .

[29] David Heckerman,et al. Decision-theoretic troubleshooting , 1995, CACM.

[30] Ryusuke Hohzaki,et al. SEARCH GAMES : LITERATURE AND SURVEY , 2016 .

[31] Shie Mannor,et al. Thompson Sampling for Complex Online Problems , 2013, ICML.

[32] Henry R. Richardson,et al. Search Theory , 2013, Springer New York.

[33] E. Lawler. Sequencing Jobs to Minimize Total Weighted Completion Time Subject to Precedence Constraints , 1978 .

[34] Aleksandrs Slivkins,et al. Bandits with Knapsacks , 2013, 2013 IEEE 54th Annual Symposium on Foundations of Computer Science.

[35] Wtt Wtt. Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits , 2015 .

[36] Wayne E. Smith. Various optimizers for single‐stage production , 1956 .

[37] Ola Svensson,et al. On the Approximability of Single-Machine Scheduling with Precedence Constraints , 2011, Math. Oper. Res..

[38] Hiroshi Nakagawa,et al. Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple Plays , 2015, ICML.

[39] Nenghai Yu,et al. Budgeted Bandit Problems with Continuous Random Costs , 2015, ACML.

[40] Wei Chen,et al. Thompson Sampling for Combinatorial Semi-Bandits , 2018, ICML.

[41] Robbert Fokkink,et al. On Submodular Search and Machine Scheduling , 2016, Math. Oper. Res..

[42] Wei Chen,et al. Improving Regret Bounds for Combinatorial Semi-Bandits with Probabilistically Triggered Arms and Its Applications , 2017, NIPS.

[43] E.L. Lawler,et al. Optimization and Approximation in Deterministic Sequencing and Scheduling: a Survey , 1977 .

[44] Shmuel Gal. On the optimality of a simple strategy for searching graphs , 2001, Int. J. Game Theory.

[45] Nenghai Yu,et al. Budgeted Multi-Armed Bandits with Multiple Plays , 2016, IJCAI.

[46] Damien Ernst,et al. Optimal discovery with probabilistic expert advice: finite time analysis and macroscopic optimality , 2012, J. Mach. Learn. Res..

[47] Junpei Komiyama,et al. KL-UCB-Based Policy for Budgeted Multi-Armed Bandits with Stochastic Action Costs , 2017, IEICE Trans. Fundam. Electron. Commun. Comput. Sci..

[48] Kazuoki Azuma. WEIGHTED SUMS OF CERTAIN DEPENDENT RANDOM VARIABLES , 1967 .

[49] Nicolò Cesa-Bianchi,et al. Combinatorial Bandits , 2012, COLT.

[50] L. Stone. Theory of Optimal Search , 1975 .

[51] Václav Lín,et al. Scheduling results applicable to decision-theoretic troubleshooting , 2015, Int. J. Approx. Reason..

[52] Mehryar Mohri,et al. Optimal Regret Minimization in Posted-Price Auctions with Strategic Buyers , 2014, NIPS.

[53] William H. Ruckle,et al. Initial Point Search on Weighted Trees , 1994 .