Combinatorial Bandits
暂无分享,去创建一个
[1] José Niño-Mora,et al. Computing a Classic Index for Finite-Horizon Bandits , 2011, INFORMS J. Comput..
[2] T. Sharot. The optimism bias , 2011, Current Biology.
[3] Vianney Perchet,et al. The multi-armed bandit problem with covariates , 2011, ArXiv.
[4] Eric Moulines,et al. On Upper-Confidence Bound Policies for Switching Bandit Problems , 2011, ALT.
[5] José Niòo-Mora. Computing a Classic Index for Finite-Horizon Bandits , 2011 .
[6] Kevin D. Glazebrook,et al. Multi-Armed Bandit Allocation Indices: Gittins/Multi-Armed Bandit Allocation Indices , 2011 .
[7] Vianney Perchet,et al. Approachability of Convex Sets in Games with Partial Monitoring , 2011, J. Optim. Theory Appl..
[8] Sébastien Gerchinovitz,et al. Sparsity Regret Bounds for Individual Sequences in Online Linear Regression , 2011, COLT.
[9] Aleksandrs Slivkins,et al. Contextual Bandits with Similarity Information , 2009, COLT.
[10] Akimichi Takemura,et al. An asymptotically optimal policy for finite support models in the multiarmed bandit problem , 2009, Machine Learning.
[11] Csaba Szepesvári,et al. Toward a classification of finite partial-monitoring games , 2010, Theor. Comput. Sci..
[12] Peter Auer,et al. UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem , 2010, Period. Math. Hung..
[13] O. Catoni. Challenging the empirical mean and empirical variance: a deviation study , 2010, 1009.2048.
[14] Ole-Christoffer Granmo,et al. Solving two-armed Bernoulli bandit problems using a Bayesian learning automaton , 2010, Int. J. Intell. Comput. Cybern..
[15] Wouter M. Koolen,et al. Hedging Structured Concepts , 2010, COLT.
[16] Jean-Yves Audibert,et al. Regret Bounds and Minimax Policies under Partial Monitoring , 2010, J. Mach. Learn. Res..
[17] John N. Tsitsiklis,et al. Linearly Parameterized Bandits , 2008, Math. Oper. Res..
[18] Daniel A. Braun,et al. A Minimum Relative Entropy Principle for Learning and Acting , 2008, J. Artif. Intell. Res..
[19] Rémi Munos,et al. Pure Exploration in Multi-armed Bandits Problems , 2009, ALT.
[20] Jean-Yves Audibert,et al. Minimax Policies for Adversarial and Stochastic Bandits. , 2009, COLT 2009.
[21] Jacob D. Abernethy,et al. Beating the adaptive bandit with high probability , 2009, 2009 Information Theory and Applications Workshop.
[22] Katya Scheinberg,et al. Introduction to derivative-free optimization , 2010, Math. Comput..
[23] Csaba Szepesvári,et al. Exploration-exploitation tradeoff using variance estimates in multi-armed bandits , 2009, Theor. Comput. Sci..
[24] T. Lai. Martingales in Sequential Analysis and Time Series, 1945-1985 ∗ , 2009 .
[25] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[26] Michael L. Littman,et al. An analysis of model-based Interval Estimation for Markov Decision Processes , 2008, J. Comput. Syst. Sci..
[27] I. Sonin. A generalized Gittins index for a Markov chain and its recursive calculation , 2008 .
[28] Filip Radlinski,et al. Learning diverse rankings with multi-armed bandits , 2008, ICML '08.
[29] Ambuj Tewari,et al. Efficient bandit algorithms for online multiclass prediction , 2008, ICML '08.
[30] Thomas P. Hayes,et al. High-Probability Regret Bounds for Bandit Online Linear Optimization , 2008, COLT.
[31] E. Ionides. Truncated Importance Sampling , 2008 .
[32] Elad Hazan,et al. Competing in the Dark: An Efficient Algorithm for Bandit Linear Optimization , 2008, COLT.
[33] Thomas P. Hayes,et al. The Price of Bandit Information for Online Optimization , 2007, NIPS.
[34] Yoram Singer,et al. A primal-dual perspective of online learning algorithms , 2007, Machine Learning.
[35] J. Aldrich. But you have to remember P.J. Daniell of Sheffield , 2007 .
[36] Manfred K. Warmuth,et al. Learning Permutations with Exponential Weights , 2007, COLT.
[37] Uriel G. Rothblum,et al. Risk-Sensitive and Risk-Neutral Multiarmed Bandits , 2007, Math. Oper. Res..
[38] Santosh S. Vempala,et al. The geometry of logconcave functions and sampling algorithms , 2007, Random Struct. Algorithms.
[39] Tamás Linder,et al. The On-Line Shortest Path Problem Under Partial Monitoring , 2007, J. Mach. Learn. Res..
[40] Magyar Tud. The On-Line Shortest Path Problem Under Partial Monitoring , 2007 .
[41] J. K. Hunter,et al. Measure Theory , 2007 .
[42] Shie Mannor,et al. Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems , 2006, J. Mach. Learn. Res..
[43] Nicolò Cesa-Bianchi,et al. Regret Minimization Under Partial Monitoring , 2006, 2006 IEEE Information Theory Workshop - ITW '06 Punta del Este.
[44] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .
[45] Santosh S. Vempala,et al. On The Approximability Of The Traveling Salesman Problem , 2006, Comb..
[46] Thomas P. Hayes,et al. Robbing the bandit: less regret in online geometric optimization against an adaptive adversary , 2006, SODA '06.
[47] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.
[48] Tapio Elomaa,et al. On Following the Perturbed Leader in the Bandit Setting , 2005, ALT.
[49] Marcus Hutter,et al. Adaptive Online Prediction by Following the Perturbed Leader , 2005, J. Mach. Learn. Res..
[50] T. Lai,et al. Sequential Generalized Likelihood Ratios and Adaptive Treatment Allocation for Optimal Sequential Selection , 2006 .
[51] Eric Vigoda,et al. A polynomial-time approximation algorithm for the permanent of a matrix with nonnegative entries , 2004, JACM.
[52] Avrim Blum,et al. Online Geometric Optimization in the Bandit Setting Against an Adaptive Adversary , 2004, COLT.
[53] Baruch Awerbuch,et al. Adaptive routing with end-to-end feedback: distributed learning and geometric approaches , 2004, STOC '04.
[54] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[55] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.
[56] Mark Herbster,et al. Tracking the Best Expert , 1995, Machine Learning.
[57] Santosh S. Vempala,et al. Efficient algorithms for online decision problems , 2005, J. Comput. Syst. Sci..
[58] John N. Tsitsiklis,et al. The Sample Complexity of Exploration in the Multi-Armed Bandit Problem , 2004, J. Mach. Learn. Res..
[59] A. Burnetas,et al. ASYMPTOTIC BAYES ANALYSIS FOR THE FINITE-HORIZON ONE-ARMED-BANDIT PROBLEM , 2003, Probability in the Engineering and Informational Sciences.
[60] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..
[61] Eric R. Ziegel,et al. Generalized Linear Models , 2002, Technometrics.
[62] Manfred K. Warmuth,et al. Path Kernels and Multiplicative Updates , 2002, J. Mach. Learn. Res..
[63] G. Loewenstein,et al. Time Discounting and Time Preference: A Critical Review , 2002 .
[64] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[65] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..
[66] Mark Herbster,et al. Tracking the Best Linear Predictor , 2001, J. Mach. Learn. Res..
[67] P. Massart,et al. Adaptive estimation of a quadratic functional by model selection , 2000 .
[68] Philip M. Long,et al. Apple Tasting , 2000, Inf. Comput..
[69] Chun-Hung Chen,et al. Simulation Budget Allocation for Further Enhancing the Efficiency of Ordinal Optimization , 2000, Discret. Event Dyn. Syst..
[70] Sanjeev R. Kulkarni,et al. Finite-time lower bounds for the two-armed bandit problem , 2000, IEEE Trans. Autom. Control..
[71] R. Rockafellar,et al. Optimization of conditional value-at risk , 2000 .
[72] A. Rustichini. Minimizing Regret : The General Case , 1999 .
[73] Geoffrey J. Gordon. Regret bounds for prediction problems , 1999, COLT '99.
[74] Demosthenis Teneketzis,et al. On the optimality of the Gittins index rule for multi-armed bandits with multiple plays , 1995, Math. Methods Oper. Res..
[75] Dana Randall,et al. Sampling spin configurations of an Ising system , 1999, SODA '99.
[76] David Bruce Wilson,et al. How to Get a Perfectly Random Sample from a Generic Markov Chain and Generate a Random Spanning Tree of a Directed Graph , 1998, J. Algorithms.
[77] S. Robertson. The probability ranking principle in IR , 1997 .
[78] Joseph T. Chang,et al. Conditioning as disintegration , 1997 .
[79] T. L. Graves,et al. Asymptotically Efficient Adaptive Choice of Control Laws inControlled Markov Chains , 1997 .
[80] Apostolos Burnetas,et al. Optimal Adaptive Policies for Markov Decision Processes , 1997, Math. Oper. Res..
[81] Robert W. Chen,et al. Bandit problems with infinitely many arms , 1997 .
[82] A. Burnetas,et al. Optimal Adaptive Policies for Sequential Allocation Problems , 1996 .
[83] Jon A. Wellner,et al. Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .
[84] R. Agrawal. Sample mean based index policies by O(log n) regret for the multi-armed bandit problem , 1995, Advances in Applied Probability.
[85] S. Axler. Linear Algebra Done Right , 1995, Undergraduate Texts in Mathematics.
[86] H Robbins,et al. Sequential choice from several populations. , 1995, Proceedings of the National Academy of Sciences of the United States of America.
[87] M. Talagrand. The missing factor in Hoeffding's inequalities , 1995 .
[88] I. Karatzas,et al. Dynamic Allocation Problems in Continuous Time , 1994 .
[89] Umesh V. Vazirani,et al. An Introduction to Computational Learning Theory , 1994 .
[90] J. Tsitsiklis. A short proof of the Gittins index theorem , 1993, Proceedings of 32nd IEEE Conference on Decision and Control.
[91] Mark Jerrum,et al. Polynomial-Time Approximation Algorithms for the Ising Model , 1990, SIAM J. Comput..
[92] R. Pemantle,et al. Local Characteristics, Entropy and Limit Theorems for Spanning Trees and Domino Tilings Via Transfer-Impedances , 1993, math/0404048.
[93] Leslie Pack Kaelbling,et al. Learning in embedded systems , 1993 .
[94] M. K. Ghosh,et al. Discrete-time controlled Markov processes with average cost criterion: a survey , 1993 .
[95] R. Weber. On the Gittins Index for Multiarmed Bandits , 1992 .
[96] Yu-Chi Ho,et al. Ordinal optimization of DEDS , 1992, Discret. Event Dyn. Syst..
[97] W. Willinger,et al. Universal Portfolios , 1991 .
[98] R. Weber,et al. On an index policy for restless bandits , 1990, Journal of Applied Probability.
[99] R. Gray. Entropy and Information Theory , 1990, Springer New York.
[100] Manfred K. Warmuth,et al. The weighted majority algorithm , 1989, 30th Annual Symposium on Foundations of Computer Science.
[101] G. Rappl. On Linear Convergence of a Class of Random Search Algorithms , 1989 .
[102] H. Komiya. Elementary proof for Sion's minimax theorem , 1988 .
[103] P. Whittle. Restless bandits: activity allocation in a changing world , 1988, Journal of Applied Probability.
[104] J. Walrand,et al. Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-Part II: Markovian rewards , 1987 .
[105] T. Lai. Adaptive treatment allocation and the multi-armed bandit problem , 1987 .
[106] John N. Tsitsiklis,et al. The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..
[107] Michael N. Katehakis,et al. Linear Programming for Finite State Multi-Armed Bandit Problems , 1986, Math. Oper. Res..
[108] Lodewijk C. M. Kallenberg,et al. A Note on M. N. Katehakis' and Y.-R. Chen's Computation of the Gittins Index , 1986, Math. Oper. Res..
[109] H. R. Lerche. Boundary Crossing of Brownian Motion , 1986 .
[110] Jean Walrand,et al. Extensions of the multiarmed bandit problem: The discounted case , 1985 .
[111] H. Robbins,et al. Asymptotically efficient adaptive allocation rules , 1985 .
[112] James O. Berger,et al. Statistical Decision Theory and Bayesian Analysis, Second Edition , 1985 .
[113] Peter G. Doyle,et al. Random Walks and Electric Networks: REFERENCES , 1987 .
[114] R. Bellman. Eye of the Hurricane , 1984 .
[115] P. Whittle. Multi‐Armed Bandits and the Gittins Index , 1980 .
[116] A. Tversky,et al. Prospect theory: analysis of decision under risk , 1979 .
[117] G. Box. Robustness in the Strategy of Scientific Model Building. , 1979 .
[118] J. Gittins. Bandit processes and dynamic allocation indices , 1979 .
[119] Peter Armitage,et al. Sequential Medical Trials , 1961, Biomedicine / [publiee pour l'A.A.I.C.I.G.].
[120] G. Box. Science and Statistics , 1976 .
[121] G. Simons. Great Expectations: Theory of Optimal Stopping , 1973 .
[122] L. Lecam. Convergence of Estimates Under Dimensionality Restrictions , 1973 .
[123] H. Robbins,et al. Boundary Crossing Probabilities for the Wiener Process and Sample Sums , 1970 .
[124] H. Wynn. The Sequential Generation of $D$-Optimum Experimental Designs , 1970 .
[125] H. Chernoff,et al. Sequential decisions in the control of a space-ship (finite fuel) , 1967, Journal of Applied Probability.
[126] Walter T. Federer,et al. Sequential Design of Experiments , 1967 .
[127] Per Martin-Löf,et al. The Definition of Random Sequences , 1966, Inf. Control..
[128] R. Strauch. Negative Dynamic Programming , 1966 .
[129] F. J. Anscombe. Sequential Medical Trials , 1963 .
[130] M. E. Maron,et al. On Relevance, Probabilistic Indexing and Information Retrieval , 1960, JACM.
[131] W. Vogel. An Asymptotic Minimax Theorem for the Two Armed Bandit Problem , 1960 .
[132] J. Kiefer,et al. The Equivalence of Two Extremum Problems , 1960, Canadian Journal of Mathematics.
[133] M. Sion. On general minimax theorems , 1958 .
[134] James Hannan,et al. 4. APPROXIMATION TO RAYES RISK IN REPEATED PLAY , 1958 .
[135] R. N. Bradt,et al. On Sequential Designs for Maximizing the Sum of $n$ Observations , 1956 .
[136] Philip Wolfe,et al. An algorithm for quadratic programming , 1956 .
[137] D. Bernoulli. Exposition of a New Theory on the Measurement of Risk , 1954 .
[138] R. R. Bush,et al. A Stochastic Model with Applications to Learning , 1953 .
[139] A. C. Berry. The accuracy of the Gaussian approximation to the sum of independent variates , 1941 .
[140] P. Samuelson. A Note on Measurement of Utility , 1937 .
[141] W. R. Thompson. On the Theory of Apportionment , 1935 .
[142] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[143] J. Neumann. Zur Theorie der Gesellschaftsspiele , 1928 .
[144] C. McDiarmid. Concentration , 1862, The Dental register.
[145] T. Bayes. LII. An essay towards solving a problem in the doctrine of chances. By the late Rev. Mr. Bayes, F. R. S. communicated by Mr. Price, in a letter to John Canton, A. M. F. R. S , 1763, Philosophical Transactions of the Royal Society of London.