论文信息 - Combinatorial Bandits - 字舞流文

Combinatorial Bandits

Nicolò Cesa-Bianchi | Gábor Lugosi | G. Lugosi | Nicolò Cesa-Bianchi | N. Cesa-Bianchi

[1] José Niño-Mora,et al. Computing a Classic Index for Finite-Horizon Bandits , 2011, INFORMS J. Comput..

[2] T. Sharot. The optimism bias , 2011, Current Biology.

[3] Vianney Perchet,et al. The multi-armed bandit problem with covariates , 2011, ArXiv.

[4] Eric Moulines,et al. On Upper-Confidence Bound Policies for Switching Bandit Problems , 2011, ALT.

[5] José Niòo-Mora. Computing a Classic Index for Finite-Horizon Bandits , 2011 .

[6] Kevin D. Glazebrook,et al. Multi-Armed Bandit Allocation Indices: Gittins/Multi-Armed Bandit Allocation Indices , 2011 .

[7] Vianney Perchet,et al. Approachability of Convex Sets in Games with Partial Monitoring , 2011, J. Optim. Theory Appl..

[8] Sébastien Gerchinovitz,et al. Sparsity Regret Bounds for Individual Sequences in Online Linear Regression , 2011, COLT.

[9] Aleksandrs Slivkins,et al. Contextual Bandits with Similarity Information , 2009, COLT.

[10] Akimichi Takemura,et al. An asymptotically optimal policy for finite support models in the multiarmed bandit problem , 2009, Machine Learning.

[11] Csaba Szepesvári,et al. Toward a classification of finite partial-monitoring games , 2010, Theor. Comput. Sci..

[12] Peter Auer,et al. UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem , 2010, Period. Math. Hung..

[13] O. Catoni. Challenging the empirical mean and empirical variance: a deviation study , 2010, 1009.2048.

[14] Ole-Christoffer Granmo,et al. Solving two-armed Bernoulli bandit problems using a Bayesian learning automaton , 2010, Int. J. Intell. Comput. Cybern..

[15] Wouter M. Koolen,et al. Hedging Structured Concepts , 2010, COLT.

[16] Jean-Yves Audibert,et al. Regret Bounds and Minimax Policies under Partial Monitoring , 2010, J. Mach. Learn. Res..

[17] John N. Tsitsiklis,et al. Linearly Parameterized Bandits , 2008, Math. Oper. Res..

[18] Daniel A. Braun,et al. A Minimum Relative Entropy Principle for Learning and Acting , 2008, J. Artif. Intell. Res..

[19] Rémi Munos,et al. Pure Exploration in Multi-armed Bandits Problems , 2009, ALT.

[20] Jean-Yves Audibert,et al. Minimax Policies for Adversarial and Stochastic Bandits. , 2009, COLT 2009.

[21] Jacob D. Abernethy,et al. Beating the adaptive bandit with high probability , 2009, 2009 Information Theory and Applications Workshop.

[22] Katya Scheinberg,et al. Introduction to derivative-free optimization , 2010, Math. Comput..

[23] Csaba Szepesvári,et al. Exploration-exploitation tradeoff using variance estimates in multi-armed bandits , 2009, Theor. Comput. Sci..

[24] T. Lai. Martingales in Sequential Analysis and Time Series, 1945-1985 ∗ , 2009 .

[25] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..

[26] Michael L. Littman,et al. An analysis of model-based Interval Estimation for Markov Decision Processes , 2008, J. Comput. Syst. Sci..

[27] I. Sonin. A generalized Gittins index for a Markov chain and its recursive calculation , 2008 .

[28] Filip Radlinski,et al. Learning diverse rankings with multi-armed bandits , 2008, ICML '08.

[29] Ambuj Tewari,et al. Efficient bandit algorithms for online multiclass prediction , 2008, ICML '08.

[30] Thomas P. Hayes,et al. High-Probability Regret Bounds for Bandit Online Linear Optimization , 2008, COLT.

[31] E. Ionides. Truncated Importance Sampling , 2008 .

[32] Elad Hazan,et al. Competing in the Dark: An Efficient Algorithm for Bandit Linear Optimization , 2008, COLT.

[33] Thomas P. Hayes,et al. The Price of Bandit Information for Online Optimization , 2007, NIPS.

[34] Yoram Singer,et al. A primal-dual perspective of online learning algorithms , 2007, Machine Learning.

[35] J. Aldrich. But you have to remember P.J. Daniell of Sheffield , 2007 .

[36] Manfred K. Warmuth,et al. Learning Permutations with Exponential Weights , 2007, COLT.

[37] Uriel G. Rothblum,et al. Risk-Sensitive and Risk-Neutral Multiarmed Bandits , 2007, Math. Oper. Res..

[38] Santosh S. Vempala,et al. The geometry of logconcave functions and sampling algorithms , 2007, Random Struct. Algorithms.

[39] Tamás Linder,et al. The On-Line Shortest Path Problem Under Partial Monitoring , 2007, J. Mach. Learn. Res..

[40] Magyar Tud. The On-Line Shortest Path Problem Under Partial Monitoring , 2007 .

[41] J. K. Hunter,et al. Measure Theory , 2007 .

[42] Shie Mannor,et al. Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems , 2006, J. Mach. Learn. Res..

[43] Nicolò Cesa-Bianchi,et al. Regret Minimization Under Partial Monitoring , 2006, 2006 IEEE Information Theory Workshop - ITW '06 Punta del Este.

[44] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .

[45] Santosh S. Vempala,et al. On The Approximability Of The Traveling Salesman Problem , 2006, Comb..

[46] Thomas P. Hayes,et al. Robbing the bandit: less regret in online geometric optimization against an adaptive adversary , 2006, SODA '06.

[47] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[48] Tapio Elomaa,et al. On Following the Perturbed Leader in the Bandit Setting , 2005, ALT.

[49] Marcus Hutter,et al. Adaptive Online Prediction by Following the Perturbed Leader , 2005, J. Mach. Learn. Res..

[50] T. Lai,et al. Sequential Generalized Likelihood Ratios and Adaptive Treatment Allocation for Optimal Sequential Selection , 2006 .

[51] Eric Vigoda,et al. A polynomial-time approximation algorithm for the permanent of a matrix with nonnegative entries , 2004, JACM.

[52] Avrim Blum,et al. Online Geometric Optimization in the Bandit Setting Against an Adaptive Adversary , 2004, COLT.

[53] Baruch Awerbuch,et al. Adaptive routing with end-to-end feedback: distributed learning and geometric approaches , 2004, STOC '04.

[54] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[55] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.

[56] Mark Herbster,et al. Tracking the Best Expert , 1995, Machine Learning.

[57] Santosh S. Vempala,et al. Efficient algorithms for online decision problems , 2005, J. Comput. Syst. Sci..

[58] John N. Tsitsiklis,et al. The Sample Complexity of Exploration in the Multi-Armed Bandit Problem , 2004, J. Mach. Learn. Res..

[59] A. Burnetas,et al. ASYMPTOTIC BAYES ANALYSIS FOR THE FINITE-HORIZON ONE-ARMED-BANDIT PROBLEM , 2003, Probability in the Engineering and Informational Sciences.

[60] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[61] Eric R. Ziegel,et al. Generalized Linear Models , 2002, Technometrics.

[62] Manfred K. Warmuth,et al. Path Kernels and Multiplicative Updates , 2002, J. Mach. Learn. Res..

[63] G. Loewenstein,et al. Time Discounting and Time Preference: A Critical Review , 2002 .

[64] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..

[65] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[66] Mark Herbster,et al. Tracking the Best Linear Predictor , 2001, J. Mach. Learn. Res..

[67] P. Massart,et al. Adaptive estimation of a quadratic functional by model selection , 2000 .

[68] Philip M. Long,et al. Apple Tasting , 2000, Inf. Comput..

[69] Chun-Hung Chen,et al. Simulation Budget Allocation for Further Enhancing the Efficiency of Ordinal Optimization , 2000, Discret. Event Dyn. Syst..

[70] Sanjeev R. Kulkarni,et al. Finite-time lower bounds for the two-armed bandit problem , 2000, IEEE Trans. Autom. Control..

[71] R. Rockafellar,et al. Optimization of conditional value-at risk , 2000 .

[72] A. Rustichini. Minimizing Regret : The General Case , 1999 .

[73] Geoffrey J. Gordon. Regret bounds for prediction problems , 1999, COLT '99.

[74] Demosthenis Teneketzis,et al. On the optimality of the Gittins index rule for multi-armed bandits with multiple plays , 1995, Math. Methods Oper. Res..

[75] Dana Randall,et al. Sampling spin configurations of an Ising system , 1999, SODA '99.

[76] David Bruce Wilson,et al. How to Get a Perfectly Random Sample from a Generic Markov Chain and Generate a Random Spanning Tree of a Directed Graph , 1998, J. Algorithms.

[77] S. Robertson. The probability ranking principle in IR , 1997 .

[78] Joseph T. Chang,et al. Conditioning as disintegration , 1997 .

[79] T. L. Graves,et al. Asymptotically Efficient Adaptive Choice of Control Laws inControlled Markov Chains , 1997 .

[80] Apostolos Burnetas,et al. Optimal Adaptive Policies for Markov Decision Processes , 1997, Math. Oper. Res..

[81] Robert W. Chen,et al. Bandit problems with infinitely many arms , 1997 .

[82] A. Burnetas,et al. Optimal Adaptive Policies for Sequential Allocation Problems , 1996 .

[83] Jon A. Wellner,et al. Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[84] R. Agrawal. Sample mean based index policies by O(log n) regret for the multi-armed bandit problem , 1995, Advances in Applied Probability.

[85] S. Axler. Linear Algebra Done Right , 1995, Undergraduate Texts in Mathematics.

[86] H Robbins,et al. Sequential choice from several populations. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[87] M. Talagrand. The missing factor in Hoeffding's inequalities , 1995 .

[88] I. Karatzas,et al. Dynamic Allocation Problems in Continuous Time , 1994 .

[89] Umesh V. Vazirani,et al. An Introduction to Computational Learning Theory , 1994 .

[90] J. Tsitsiklis. A short proof of the Gittins index theorem , 1993, Proceedings of 32nd IEEE Conference on Decision and Control.

[91] Mark Jerrum,et al. Polynomial-Time Approximation Algorithms for the Ising Model , 1990, SIAM J. Comput..

[92] R. Pemantle,et al. Local Characteristics, Entropy and Limit Theorems for Spanning Trees and Domino Tilings Via Transfer-Impedances , 1993, math/0404048.

[93] Leslie Pack Kaelbling,et al. Learning in embedded systems , 1993 .

[94] M. K. Ghosh,et al. Discrete-time controlled Markov processes with average cost criterion: a survey , 1993 .

[95] R. Weber. On the Gittins Index for Multiarmed Bandits , 1992 .

[96] Yu-Chi Ho,et al. Ordinal optimization of DEDS , 1992, Discret. Event Dyn. Syst..

[97] W. Willinger,et al. Universal Portfolios , 1991 .

[98] R. Weber,et al. On an index policy for restless bandits , 1990, Journal of Applied Probability.

[99] R. Gray. Entropy and Information Theory , 1990, Springer New York.

[100] Manfred K. Warmuth,et al. The weighted majority algorithm , 1989, 30th Annual Symposium on Foundations of Computer Science.

[101] G. Rappl. On Linear Convergence of a Class of Random Search Algorithms , 1989 .

[102] H. Komiya. Elementary proof for Sion's minimax theorem , 1988 .

[103] P. Whittle. Restless bandits: activity allocation in a changing world , 1988, Journal of Applied Probability.

[104] J. Walrand,et al. Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-Part II: Markovian rewards , 1987 .

[105] T. Lai. Adaptive treatment allocation and the multi-armed bandit problem , 1987 .

[106] John N. Tsitsiklis,et al. The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..

[107] Michael N. Katehakis,et al. Linear Programming for Finite State Multi-Armed Bandit Problems , 1986, Math. Oper. Res..

[108] Lodewijk C. M. Kallenberg,et al. A Note on M. N. Katehakis' and Y.-R. Chen's Computation of the Gittins Index , 1986, Math. Oper. Res..

[109] H. R. Lerche. Boundary Crossing of Brownian Motion , 1986 .

[110] Jean Walrand,et al. Extensions of the multiarmed bandit problem: The discounted case , 1985 .

[111] H. Robbins,et al. Asymptotically efficient adaptive allocation rules , 1985 .

[112] James O. Berger,et al. Statistical Decision Theory and Bayesian Analysis, Second Edition , 1985 .

[113] Peter G. Doyle,et al. Random Walks and Electric Networks: REFERENCES , 1987 .

[114] R. Bellman. Eye of the Hurricane , 1984 .

[115] P. Whittle. Multi‐Armed Bandits and the Gittins Index , 1980 .

[116] A. Tversky,et al. Prospect theory: analysis of decision under risk , 1979 .

[117] G. Box. Robustness in the Strategy of Scientific Model Building. , 1979 .

[118] J. Gittins. Bandit processes and dynamic allocation indices , 1979 .

[119] Peter Armitage,et al. Sequential Medical Trials , 1961, Biomedicine / [publiee pour l'A.A.I.C.I.G.].

[120] G. Box. Science and Statistics , 1976 .

[121] G. Simons. Great Expectations: Theory of Optimal Stopping , 1973 .

[122] L. Lecam. Convergence of Estimates Under Dimensionality Restrictions , 1973 .

[123] H. Robbins,et al. Boundary Crossing Probabilities for the Wiener Process and Sample Sums , 1970 .

[124] H. Wynn. The Sequential Generation of $D$-Optimum Experimental Designs , 1970 .

[125] H. Chernoff,et al. Sequential decisions in the control of a space-ship (finite fuel) , 1967, Journal of Applied Probability.

[126] Walter T. Federer,et al. Sequential Design of Experiments , 1967 .

[127] Per Martin-Löf,et al. The Definition of Random Sequences , 1966, Inf. Control..

[128] R. Strauch. Negative Dynamic Programming , 1966 .

[129] F. J. Anscombe. Sequential Medical Trials , 1963 .

[130] M. E. Maron,et al. On Relevance, Probabilistic Indexing and Information Retrieval , 1960, JACM.

[131] W. Vogel. An Asymptotic Minimax Theorem for the Two Armed Bandit Problem , 1960 .

[132] J. Kiefer,et al. The Equivalence of Two Extremum Problems , 1960, Canadian Journal of Mathematics.

[133] M. Sion. On general minimax theorems , 1958 .

[134] James Hannan,et al. 4. APPROXIMATION TO RAYES RISK IN REPEATED PLAY , 1958 .

[135] R. N. Bradt,et al. On Sequential Designs for Maximizing the Sum of $n$ Observations , 1956 .

[136] Philip Wolfe,et al. An algorithm for quadratic programming , 1956 .

[137] D. Bernoulli. Exposition of a New Theory on the Measurement of Risk , 1954 .

[138] R. R. Bush,et al. A Stochastic Model with Applications to Learning , 1953 .

[139] A. C. Berry. The accuracy of the Gaussian approximation to the sum of independent variates , 1941 .

[140] P. Samuelson. A Note on Measurement of Utility , 1937 .

[141] W. R. Thompson. On the Theory of Apportionment , 1935 .

[142] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[143] J. Neumann. Zur Theorie der Gesellschaftsspiele , 1928 .

[144] C. McDiarmid. Concentration , 1862, The Dental register.

[145] T. Bayes. LII. An essay towards solving a problem in the doctrine of chances. By the late Rev. Mr. Bayes, F. R. S. communicated by Mr. Price, in a letter to John Canton, A. M. F. R. S , 1763, Philosophical Transactions of the Royal Society of London.