Combinatorial Bandits

[1]  José Niño-Mora,et al.  Computing a Classic Index for Finite-Horizon Bandits , 2011, INFORMS J. Comput..

[2]  T. Sharot The optimism bias , 2011, Current Biology.

[3]  Vianney Perchet,et al.  The multi-armed bandit problem with covariates , 2011, ArXiv.

[4]  Eric Moulines,et al.  On Upper-Confidence Bound Policies for Switching Bandit Problems , 2011, ALT.

[5]  José Niòo-Mora Computing a Classic Index for Finite-Horizon Bandits , 2011 .

[6]  Kevin D. Glazebrook,et al.  Multi-Armed Bandit Allocation Indices: Gittins/Multi-Armed Bandit Allocation Indices , 2011 .

[7]  Vianney Perchet,et al.  Approachability of Convex Sets in Games with Partial Monitoring , 2011, J. Optim. Theory Appl..

[8]  Sébastien Gerchinovitz,et al.  Sparsity Regret Bounds for Individual Sequences in Online Linear Regression , 2011, COLT.

[9]  Aleksandrs Slivkins,et al.  Contextual Bandits with Similarity Information , 2009, COLT.

[10]  Akimichi Takemura,et al.  An asymptotically optimal policy for finite support models in the multiarmed bandit problem , 2009, Machine Learning.

[11]  Csaba Szepesvári,et al.  Toward a classification of finite partial-monitoring games , 2010, Theor. Comput. Sci..

[12]  Peter Auer,et al.  UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem , 2010, Period. Math. Hung..

[13]  O. Catoni Challenging the empirical mean and empirical variance: a deviation study , 2010, 1009.2048.

[14]  Ole-Christoffer Granmo,et al.  Solving two-armed Bernoulli bandit problems using a Bayesian learning automaton , 2010, Int. J. Intell. Comput. Cybern..

[15]  Wouter M. Koolen,et al.  Hedging Structured Concepts , 2010, COLT.

[16]  Jean-Yves Audibert,et al.  Regret Bounds and Minimax Policies under Partial Monitoring , 2010, J. Mach. Learn. Res..

[17]  John N. Tsitsiklis,et al.  Linearly Parameterized Bandits , 2008, Math. Oper. Res..

[18]  Daniel A. Braun,et al.  A Minimum Relative Entropy Principle for Learning and Acting , 2008, J. Artif. Intell. Res..

[19]  Rémi Munos,et al.  Pure Exploration in Multi-armed Bandits Problems , 2009, ALT.

[20]  Jean-Yves Audibert,et al.  Minimax Policies for Adversarial and Stochastic Bandits. , 2009, COLT 2009.

[21]  Jacob D. Abernethy,et al.  Beating the adaptive bandit with high probability , 2009, 2009 Information Theory and Applications Workshop.

[22]  Katya Scheinberg,et al.  Introduction to derivative-free optimization , 2010, Math. Comput..

[23]  Csaba Szepesvári,et al.  Exploration-exploitation tradeoff using variance estimates in multi-armed bandits , 2009, Theor. Comput. Sci..

[24]  T. Lai Martingales in Sequential Analysis and Time Series, 1945-1985 ∗ , 2009 .

[25]  Peter Auer,et al.  Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..

[26]  Michael L. Littman,et al.  An analysis of model-based Interval Estimation for Markov Decision Processes , 2008, J. Comput. Syst. Sci..

[27]  I. Sonin A generalized Gittins index for a Markov chain and its recursive calculation , 2008 .

[28]  Filip Radlinski,et al.  Learning diverse rankings with multi-armed bandits , 2008, ICML '08.

[29]  Ambuj Tewari,et al.  Efficient bandit algorithms for online multiclass prediction , 2008, ICML '08.

[30]  Thomas P. Hayes,et al.  High-Probability Regret Bounds for Bandit Online Linear Optimization , 2008, COLT.

[31]  E. Ionides Truncated Importance Sampling , 2008 .

[32]  Elad Hazan,et al.  Competing in the Dark: An Efficient Algorithm for Bandit Linear Optimization , 2008, COLT.

[33]  Thomas P. Hayes,et al.  The Price of Bandit Information for Online Optimization , 2007, NIPS.

[34]  Yoram Singer,et al.  A primal-dual perspective of online learning algorithms , 2007, Machine Learning.

[35]  J. Aldrich But you have to remember P.J. Daniell of Sheffield , 2007 .

[36]  Manfred K. Warmuth,et al.  Learning Permutations with Exponential Weights , 2007, COLT.

[37]  Uriel G. Rothblum,et al.  Risk-Sensitive and Risk-Neutral Multiarmed Bandits , 2007, Math. Oper. Res..

[38]  Santosh S. Vempala,et al.  The geometry of logconcave functions and sampling algorithms , 2007, Random Struct. Algorithms.

[39]  Tamás Linder,et al.  The On-Line Shortest Path Problem Under Partial Monitoring , 2007, J. Mach. Learn. Res..

[40]  Magyar Tud The On-Line Shortest Path Problem Under Partial Monitoring , 2007 .

[41]  J. K. Hunter,et al.  Measure Theory , 2007 .

[42]  Shie Mannor,et al.  Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems , 2006, J. Mach. Learn. Res..

[43]  Nicolò Cesa-Bianchi,et al.  Regret Minimization Under Partial Monitoring , 2006, 2006 IEEE Information Theory Workshop - ITW '06 Punta del Este.

[44]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[45]  Santosh S. Vempala,et al.  On The Approximability Of The Traveling Salesman Problem , 2006, Comb..

[46]  Thomas P. Hayes,et al.  Robbing the bandit: less regret in online geometric optimization against an adaptive adversary , 2006, SODA '06.

[47]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[48]  Tapio Elomaa,et al.  On Following the Perturbed Leader in the Bandit Setting , 2005, ALT.

[49]  Marcus Hutter,et al.  Adaptive Online Prediction by Following the Perturbed Leader , 2005, J. Mach. Learn. Res..

[50]  T. Lai,et al.  Sequential Generalized Likelihood Ratios and Adaptive Treatment Allocation for Optimal Sequential Selection , 2006 .

[51]  Eric Vigoda,et al.  A polynomial-time approximation algorithm for the permanent of a matrix with nonnegative entries , 2004, JACM.

[52]  Avrim Blum,et al.  Online Geometric Optimization in the Bandit Setting Against an Adaptive Adversary , 2004, COLT.

[53]  Baruch Awerbuch,et al.  Adaptive routing with end-to-end feedback: distributed learning and geometric approaches , 2004, STOC '04.

[54]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[55]  Michael Kearns,et al.  Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.

[56]  Mark Herbster,et al.  Tracking the Best Expert , 1995, Machine Learning.

[57]  Santosh S. Vempala,et al.  Efficient algorithms for online decision problems , 2005, J. Comput. Syst. Sci..

[58]  John N. Tsitsiklis,et al.  The Sample Complexity of Exploration in the Multi-Armed Bandit Problem , 2004, J. Mach. Learn. Res..

[59]  A. Burnetas,et al.  ASYMPTOTIC BAYES ANALYSIS FOR THE FINITE-HORIZON ONE-ARMED-BANDIT PROBLEM , 2003, Probability in the Engineering and Informational Sciences.

[60]  Peter Auer,et al.  Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[61]  Eric R. Ziegel,et al.  Generalized Linear Models , 2002, Technometrics.

[62]  Manfred K. Warmuth,et al.  Path Kernels and Multiplicative Updates , 2002, J. Mach. Learn. Res..

[63]  G. Loewenstein,et al.  Time Discounting and Time Preference: A Critical Review , 2002 .

[64]  Ronen I. Brafman,et al.  R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..

[65]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[66]  Mark Herbster,et al.  Tracking the Best Linear Predictor , 2001, J. Mach. Learn. Res..

[67]  P. Massart,et al.  Adaptive estimation of a quadratic functional by model selection , 2000 .

[68]  Philip M. Long,et al.  Apple Tasting , 2000, Inf. Comput..

[69]  Chun-Hung Chen,et al.  Simulation Budget Allocation for Further Enhancing the Efficiency of Ordinal Optimization , 2000, Discret. Event Dyn. Syst..

[70]  Sanjeev R. Kulkarni,et al.  Finite-time lower bounds for the two-armed bandit problem , 2000, IEEE Trans. Autom. Control..

[71]  R. Rockafellar,et al.  Optimization of conditional value-at risk , 2000 .

[72]  A. Rustichini Minimizing Regret : The General Case , 1999 .

[73]  Geoffrey J. Gordon Regret bounds for prediction problems , 1999, COLT '99.

[74]  Demosthenis Teneketzis,et al.  On the optimality of the Gittins index rule for multi-armed bandits with multiple plays , 1995, Math. Methods Oper. Res..

[75]  Dana Randall,et al.  Sampling spin configurations of an Ising system , 1999, SODA '99.

[76]  David Bruce Wilson,et al.  How to Get a Perfectly Random Sample from a Generic Markov Chain and Generate a Random Spanning Tree of a Directed Graph , 1998, J. Algorithms.

[77]  S. Robertson The probability ranking principle in IR , 1997 .

[78]  Joseph T. Chang,et al.  Conditioning as disintegration , 1997 .

[79]  T. L. Graves,et al.  Asymptotically Efficient Adaptive Choice of Control Laws inControlled Markov Chains , 1997 .

[80]  Apostolos Burnetas,et al.  Optimal Adaptive Policies for Markov Decision Processes , 1997, Math. Oper. Res..

[81]  Robert W. Chen,et al.  Bandit problems with infinitely many arms , 1997 .

[82]  A. Burnetas,et al.  Optimal Adaptive Policies for Sequential Allocation Problems , 1996 .

[83]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[84]  R. Agrawal Sample mean based index policies by O(log n) regret for the multi-armed bandit problem , 1995, Advances in Applied Probability.

[85]  S. Axler Linear Algebra Done Right , 1995, Undergraduate Texts in Mathematics.

[86]  H Robbins,et al.  Sequential choice from several populations. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[87]  M. Talagrand The missing factor in Hoeffding's inequalities , 1995 .

[88]  I. Karatzas,et al.  Dynamic Allocation Problems in Continuous Time , 1994 .

[89]  Umesh V. Vazirani,et al.  An Introduction to Computational Learning Theory , 1994 .

[90]  J. Tsitsiklis A short proof of the Gittins index theorem , 1993, Proceedings of 32nd IEEE Conference on Decision and Control.

[91]  Mark Jerrum,et al.  Polynomial-Time Approximation Algorithms for the Ising Model , 1990, SIAM J. Comput..

[92]  R. Pemantle,et al.  Local Characteristics, Entropy and Limit Theorems for Spanning Trees and Domino Tilings Via Transfer-Impedances , 1993, math/0404048.

[93]  Leslie Pack Kaelbling,et al.  Learning in embedded systems , 1993 .

[94]  M. K. Ghosh,et al.  Discrete-time controlled Markov processes with average cost criterion: a survey , 1993 .

[95]  R. Weber On the Gittins Index for Multiarmed Bandits , 1992 .

[96]  Yu-Chi Ho,et al.  Ordinal optimization of DEDS , 1992, Discret. Event Dyn. Syst..

[97]  W. Willinger,et al.  Universal Portfolios , 1991 .

[98]  R. Weber,et al.  On an index policy for restless bandits , 1990, Journal of Applied Probability.

[99]  R. Gray Entropy and Information Theory , 1990, Springer New York.

[100]  Manfred K. Warmuth,et al.  The weighted majority algorithm , 1989, 30th Annual Symposium on Foundations of Computer Science.

[101]  G. Rappl On Linear Convergence of a Class of Random Search Algorithms , 1989 .

[102]  H. Komiya Elementary proof for Sion's minimax theorem , 1988 .

[103]  P. Whittle Restless bandits: activity allocation in a changing world , 1988, Journal of Applied Probability.

[104]  J. Walrand,et al.  Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-Part II: Markovian rewards , 1987 .

[105]  T. Lai Adaptive treatment allocation and the multi-armed bandit problem , 1987 .

[106]  John N. Tsitsiklis,et al.  The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..

[107]  Michael N. Katehakis,et al.  Linear Programming for Finite State Multi-Armed Bandit Problems , 1986, Math. Oper. Res..

[108]  Lodewijk C. M. Kallenberg,et al.  A Note on M. N. Katehakis' and Y.-R. Chen's Computation of the Gittins Index , 1986, Math. Oper. Res..

[109]  H. R. Lerche Boundary Crossing of Brownian Motion , 1986 .

[110]  Jean Walrand,et al.  Extensions of the multiarmed bandit problem: The discounted case , 1985 .

[111]  H. Robbins,et al.  Asymptotically efficient adaptive allocation rules , 1985 .

[112]  James O. Berger,et al.  Statistical Decision Theory and Bayesian Analysis, Second Edition , 1985 .

[113]  Peter G. Doyle,et al.  Random Walks and Electric Networks: REFERENCES , 1987 .

[114]  R. Bellman Eye of the Hurricane , 1984 .

[115]  P. Whittle Multi‐Armed Bandits and the Gittins Index , 1980 .

[116]  A. Tversky,et al.  Prospect theory: analysis of decision under risk , 1979 .

[117]  G. Box Robustness in the Strategy of Scientific Model Building. , 1979 .

[118]  J. Gittins Bandit processes and dynamic allocation indices , 1979 .

[119]  Peter Armitage,et al.  Sequential Medical Trials , 1961, Biomedicine / [publiee pour l'A.A.I.C.I.G.].

[120]  G. Box Science and Statistics , 1976 .

[121]  G. Simons Great Expectations: Theory of Optimal Stopping , 1973 .

[122]  L. Lecam Convergence of Estimates Under Dimensionality Restrictions , 1973 .

[123]  H. Robbins,et al.  Boundary Crossing Probabilities for the Wiener Process and Sample Sums , 1970 .

[124]  H. Wynn The Sequential Generation of $D$-Optimum Experimental Designs , 1970 .

[125]  H. Chernoff,et al.  Sequential decisions in the control of a space-ship (finite fuel) , 1967, Journal of Applied Probability.

[126]  Walter T. Federer,et al.  Sequential Design of Experiments , 1967 .

[127]  Per Martin-Löf,et al.  The Definition of Random Sequences , 1966, Inf. Control..

[128]  R. Strauch Negative Dynamic Programming , 1966 .

[129]  F. J. Anscombe Sequential Medical Trials , 1963 .

[130]  M. E. Maron,et al.  On Relevance, Probabilistic Indexing and Information Retrieval , 1960, JACM.

[131]  W. Vogel An Asymptotic Minimax Theorem for the Two Armed Bandit Problem , 1960 .

[132]  J. Kiefer,et al.  The Equivalence of Two Extremum Problems , 1960, Canadian Journal of Mathematics.

[133]  M. Sion On general minimax theorems , 1958 .

[134]  James Hannan,et al.  4. APPROXIMATION TO RAYES RISK IN REPEATED PLAY , 1958 .

[135]  R. N. Bradt,et al.  On Sequential Designs for Maximizing the Sum of $n$ Observations , 1956 .

[136]  Philip Wolfe,et al.  An algorithm for quadratic programming , 1956 .

[137]  D. Bernoulli Exposition of a New Theory on the Measurement of Risk , 1954 .

[138]  R. R. Bush,et al.  A Stochastic Model with Applications to Learning , 1953 .

[139]  A. C. Berry The accuracy of the Gaussian approximation to the sum of independent variates , 1941 .

[140]  P. Samuelson A Note on Measurement of Utility , 1937 .

[141]  W. R. Thompson On the Theory of Apportionment , 1935 .

[142]  W. R. Thompson ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[143]  J. Neumann Zur Theorie der Gesellschaftsspiele , 1928 .

[144]  C. McDiarmid Concentration , 1862, The Dental register.

[145]  T. Bayes LII. An essay towards solving a problem in the doctrine of chances. By the late Rev. Mr. Bayes, F. R. S. communicated by Mr. Price, in a letter to John Canton, A. M. F. R. S , 1763, Philosophical Transactions of the Royal Society of London.