暂无分享,去创建一个
Jason L. Loeppky | Ramon Lawrence | Giuseppe Burtini | J. Loeppky | Giuseppe Burtini | Ramon Lawrence
[1] Vaibhav Srivastava,et al. On optimal foraging and multi-armed bandits , 2013, 2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton).
[2] Mehryar Mohri,et al. Multi-armed Bandit Algorithms and Empirical Evaluation , 2005, ECML.
[3] Raphaël Féraud,et al. A Neural Networks Committee for the Contextual Bandit Problem , 2014, ICONIP.
[4] Shipra Agrawal,et al. Further Optimal Regret Bounds for Thompson Sampling , 2012, AISTATS.
[5] Nicolò Cesa-Bianchi,et al. On-line learning with malicious noise and the closure algorithm , 1994, Annals of Mathematics and Artificial Intelligence.
[6] M. A. Girshick,et al. Bayes and minimax solutions of sequential decision problems , 1949 .
[7] Benjamin Van Roy,et al. (More) Efficient Reinforcement Learning via Posterior Sampling , 2013, NIPS.
[8] David Tolusso,et al. Some Properties of the Randomized Play the Winner Rule , 2012 .
[9] Csaba Szepesvári,et al. –armed Bandits , 2022 .
[10] Benjamin Van Roy,et al. Learning to Optimize via Posterior Sampling , 2013, Math. Oper. Res..
[11] Joseph P. Romano,et al. On the uniform asymptotic validity of subsampling and the bootstrap , 2012, 1204.2762.
[12] Filip Radlinski,et al. Learning diverse rankings with multi-armed bandits , 2008, ICML '08.
[13] Maurits Kaptein,et al. The use of Thompson sampling to increase estimation precision , 2014, Behavior Research Methods.
[14] R. E. Morin,et al. Factors influencing rate and extent of learning in the presence of mis-informative feedback. , 1955, Journal of experimental psychology.
[15] T L Lai,et al. Sequential medical trials. , 1980, Proceedings of the National Academy of Sciences of the United States of America.
[16] F. Knight. The economic nature of the firm: From Risk, Uncertainty, and Profit , 2009 .
[17] Michèle Basseville,et al. Detecting changes in signals and systems - A survey , 1988, Autom..
[18] Eli Upfal,et al. Multi-Armed Bandits in Metric Spaces ∗ , 2008 .
[19] Alexandre Proutière,et al. Unimodal Bandits: Regret Lower Bounds and Optimal Algorithms , 2014, ICML.
[20] Shie Mannor,et al. Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems , 2006, J. Mach. Learn. Res..
[21] Hemant Tyagi,et al. Continuum Armed Bandit Problem of Few Variables in High Dimensions , 2013, WAOA.
[22] Aditya Mahajan,et al. Multi‐Armed Bandits, Gittins Index, and its Calculation , 2014 .
[23] Aurélien Garivier,et al. On Upper-Confidence Bound Policies for Non-Stationary Bandit Problems , 2008, 0805.3415.
[24] Li Zhou,et al. A Survey on Contextual Multi-armed Bandits , 2015, ArXiv.
[25] Nicolò Cesa-Bianchi,et al. Combinatorial Bandits , 2012, COLT.
[26] András Lörincz,et al. The many faces of optimism: a unifying approach , 2008, ICML '08.
[27] Shie Mannor,et al. Sub-sampling for Multi-armed Bandits , 2014, ECML/PKDD.
[28] S. Young,et al. On adjusting P-values for multiplicity. Response , 1993 .
[29] Ariel Rubinstein,et al. A Course in Game Theory , 1995 .
[30] Romain Laroche,et al. Contextual Bandit for Active Learning: Active Thompson Sampling , 2014, ICONIP.
[31] Matthew W. Hoffman,et al. An Entropy Search Portfolio for Bayesian Optimization , 2014, ArXiv.
[32] R. Lipshitz,et al. Coping with Uncertainty: A Naturalistic Decision-Making Analysis , 1997 .
[33] Jonathan L. Shapiro,et al. Thompson Sampling in Switching Environments with Bayesian Online Change Detection , 2013, AISTATS.
[34] Yisong Yue,et al. Hierarchical Exploration for Accelerating Contextual Bandits , 2012, ICML.
[35] J. Gittins. Bandit processes and dynamic allocation indices , 1979 .
[36] John N. Tsitsiklis,et al. Linearly Parameterized Bandits , 2008, Math. Oper. Res..
[37] Peter Vrancx,et al. Multi-objective χ-Armed bandits , 2014, 2014 International Joint Conference on Neural Networks (IJCNN).
[38] Wei Chu,et al. Contextual Bandits with Linear Payoff Functions , 2011, AISTATS.
[39] P. W. Jones,et al. Bandit Problems, Sequential Allocation of Experiments , 1987 .
[40] Jonathan L. Shapiro,et al. Thompson Sampling in Switching Environments with Bayesian Online Change Point Detection , 2013, AISTATS 2013.
[41] Alessandro Lazaric,et al. Hybrid Stochastic-Adversarial On-line Learning , 2009, COLT.
[42] Craig Boutilier,et al. Learning and planning in structured worlds , 2000 .
[43] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..
[44] Steven L. Scott,et al. A modern Bayesian look at the multi-armed bandit , 2010 .
[45] Rémi Munos,et al. From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning , 2014, Found. Trends Mach. Learn..
[46] Rémi Coulom,et al. Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search , 2006, Computers and Games.
[47] Thomas P. Hayes,et al. The Price of Bandit Information for Online Optimization , 2007, NIPS.
[48] Michèle Sebag,et al. Multi-armed Bandit, Dynamic Environments and Meta-Bandits , 2006 .
[49] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..
[50] Aurélien Garivier,et al. The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond , 2011, COLT.
[51] Rémi Munos,et al. Bandit Algorithms for Tree Search , 2007, UAI.
[52] Jean-Yves Audibert,et al. Regret Bounds and Minimax Policies under Partial Monitoring , 2010, J. Mach. Learn. Res..
[53] D. Kahneman,et al. Conditions for intuitive expertise: a failure to disagree. , 2009, The American psychologist.
[54] Christopher Jennison,et al. Statistical Approaches to Interim Monitoring of Medical Trials: A Review and Commentary , 1990 .
[55] H. Robbins. Some aspects of the sequential design of experiments , 1952 .
[56] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .
[57] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .
[58] D. Kahneman. A perspective on judgment and choice: mapping bounded rationality. , 2003, The American psychologist.
[59] John Langford,et al. Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits , 2014, ICML.
[60] N E Day,et al. Two-stage designs for clinical trials. , 1969, Biometrics.
[61] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[62] Stian Berg,et al. Solving dynamic bandit problems and decentralized games using the kalman bayesian learning automaton , 2010 .
[63] Olivier Teytaud,et al. Modification of UCT with Patterns in Monte-Carlo Go , 2006 .
[64] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[65] John Langford,et al. Efficient Optimal Learning for Contextual Bandits , 2011, UAI.
[66] P. Whittle. Multi‐Armed Bandits and the Gittins Index , 1980 .
[67] J. Gittins,et al. A dynamic allocation index for the discounted multiarmed bandit problem , 1979 .
[68] H. Chernoff. A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations , 1952 .
[69] Nicolò Cesa-Bianchi,et al. Finite-Time Regret Bounds for the Multiarmed Bandit Problem , 1998, ICML.
[70] Huaiyu Zhu. On Information and Sufficiency , 1997 .
[71] R. Agrawal. Sample mean based index policies by O(log n) regret for the multi-armed bandit problem , 1995, Advances in Applied Probability.
[72] S. Panchapakesan,et al. Inference about the Change-Point in a Sequence of Random Variables: A Selection Approach , 1988 .
[73] Lihong Li,et al. An Empirical Evaluation of Thompson Sampling , 2011, NIPS.
[74] H. Vincent Poor,et al. Bandit problems with side observations , 2005, IEEE Transactions on Automatic Control.
[75] W. Hoeffding. Probability Inequalities for sums of Bounded Random Variables , 1963 .
[76] Philippe Preux,et al. Cold-start Problems in Recommendation Systems via Contextual-bandit Algorithms , 2014, ArXiv.
[77] L. J. Wei,et al. The Randomized Play-the-Winner Rule in Medical Trials , 1978 .
[78] Ambuj Tewari,et al. Efficient bandit algorithms for online multiclass prediction , 2008, ICML '08.
[79] J. Tsitsiklis. A short proof of the Gittins index theorem , 1993, Proceedings of 32nd IEEE Conference on Decision and Control.
[80] Nicolò Cesa-Bianchi,et al. Online Learning with Switching Costs and Other Adaptive Adversaries , 2013, NIPS.
[81] Csaba Szepesvári,et al. Regret Bounds for the Adaptive Control of Linear Quadratic Systems , 2011, COLT.
[82] Hiroshi Nakagawa,et al. Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple Plays , 2015, ICML.
[83] Csaba Szepesvári,et al. Online Optimization in X-Armed Bandits , 2008, NIPS.
[84] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[85] Paul B. Reverdy. Modeling Human Decision-making in Multi-armed Bandits , 2013 .
[86] W F Rosenberger,et al. Randomized play-the-winner clinical trials: review and recommendations. , 1999, Controlled clinical trials.
[87] Rémi Munos,et al. A Finite-Time Analysis of Multi-armed Bandits Problems with Kullback-Leibler Divergences , 2011, COLT.
[88] Jean-Yves Audibert,et al. Minimax Policies for Adversarial and Stochastic Bandits. , 2009, COLT 2009.
[89] T. Colton. A Model for Selecting One of Two Medical Treatments , 1963 .
[90] Shein-Chung Chow,et al. Adaptive design methods in clinical trials – a review , 2008, Orphanet journal of rare diseases.
[91] E. S. Page. CONTINUOUS INSPECTION SCHEMES , 1954 .
[92] J. Sarkar. One-Armed Bandit Problems with Covariates , 1991 .
[93] Rémi Munos,et al. Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis , 2012, ALT.
[94] Akimichi Takemura,et al. Optimality of Thompson Sampling for Gaussian Bandits Depends on Priors , 2013, AISTATS.
[95] T. Lai,et al. Optimal learning and experimentation in bandit problems , 2000 .
[96] David E. Bell,et al. Disappointment in Decision Making Under Uncertainty , 1985, Oper. Res..
[97] Chris Mesterharm,et al. Experience-efficient learning in associative bandit problems , 2006, ICML.
[98] Shipra Agrawal,et al. Analysis of Thompson Sampling for the Multi-armed Bandit Problem , 2011, COLT.
[99] Vianney Perchet,et al. The multi-armed bandit problem with covariates , 2011, ArXiv.
[100] Doina Precup,et al. Algorithms for multi-armed bandit problems , 2014, ArXiv.
[101] N. Balakrishnan. Methods and Applications of Statistics in Clinical Trials: Planning, Analysis, and Inferential Methods , 2014 .
[102] V. Bentkus. On Hoeffding’s inequalities , 2004, math/0410159.
[103] Csaba Szepesvári,et al. Adaptive Monte Carlo via Bandit Allocation , 2014, ICML.
[104] Weng Kee Wong,et al. Adaptive clinical trial designs for phase I cancer studies , 2014 .
[105] Max Chevalier,et al. A Multiple-Play Bandit Algorithm Applied to Recommender Systems , 2015, FLAIRS.
[106] M. Keane,et al. Decision-Making Under Uncertainty: Capturing Dynamic Brand Choice Processes in Turbulent Consumer Goods Markets , 1996 .
[107] Sebastian U. Stich,et al. On Two Continuum Armed Bandit Problems in High Dimensions , 2014, Theory of Computing Systems.
[108] Leslie G. Valiant,et al. A theory of the learnable , 1984, STOC '84.
[109] Jason L. Loeppky,et al. Improving Online Marketing Experiments with Drifting Multi-armed Bandits , 2015, ICEIS.
[110] David Hinkley,et al. Bootstrap Methods: Another Look at the Jackknife , 2008 .
[111] Sudipto Guha,et al. Stochastic Regret Minimization via Thompson Sampling , 2014, COLT.
[112] Aurélien Garivier,et al. On Bayesian Upper Confidence Bounds for Bandit Problems , 2012, AISTATS.
[113] Day Ne. Two-stage designs for clinical trials. , 1969 .
[114] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[115] Michael N. Katehakis,et al. The Multi-Armed Bandit Problem: Decomposition and Computation , 1987, Math. Oper. Res..
[116] Deepayan Chakrabarti,et al. Bandits for Taxonomies: A Model-based Approach , 2007, SDM.
[117] Mihir Bellare,et al. Notes on Randomized Algorithms , 2014, ArXiv.
[118] Judea Pearl,et al. Heuristics : intelligent search strategies for computer problem solving , 1984 .
[119] Wei Chu,et al. A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.
[120] Matthew J. Streeter,et al. Tighter Bounds for Multi-Armed Bandits with Expert Advice , 2009, COLT.
[121] Ole-Christoffer Granmo,et al. Solving Non-Stationary Bandit Problems by Random Sampling from Sibling Kalman Filters , 2010, IEA/AIE.
[122] Shie Mannor,et al. Thompson Sampling for Complex Online Problems , 2013, ICML.
[123] Ole-Christoffer Granmo,et al. A Two-Armed Bandit Based Scheme for Accelerated Decentralized Learning , 2011, IEA/AIE.
[124] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[125] Dean Eckles,et al. Thompson sampling with the online bootstrap , 2014, ArXiv.
[126] Gideon Weiss,et al. Four proofs of Gittins’ multiarmed bandit theorem , 2016, Ann. Oper. Res..
[127] Atsuyoshi Nakamura,et al. Algorithms for Adversarial Bandit Problems with Multiple Plays , 2010, ALT.
[128] David V. Hinkley,et al. Inference about the change-point in a sequence of binomial variables , 1970 .
[129] Raphaël Féraud,et al. EXP3 with drift detection for the switching bandit problem , 2015, 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA).
[130] Michal Valko,et al. Simple regret for infinitely many armed bandits , 2015, ICML.
[131] John Langford,et al. The Epoch-Greedy Algorithm for Multi-armed Bandits with Side Information , 2007, NIPS.
[132] Simon M. Lucas,et al. A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.
[133] John Langford,et al. Contextual Bandit Algorithms with Supervised Learning Guarantees , 2010, AISTATS.
[134] Aurélien Garivier,et al. Parametric Bandits: The Generalized Linear Case , 2010, NIPS.
[135] Sébastien Bubeck,et al. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..
[136] Peter R. Nelson,et al. Multiple Comparisons: Theory and Methods , 1997 .
[137] F ROSENBLATT,et al. The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.
[138] Aleksandrs Slivkins,et al. 25th Annual Conference on Learning Theory The Best of Both Worlds: Stochastic and Adversarial Bandits , 2022 .
[139] M. Woodroofe. A One-Armed Bandit Problem with a Concomitant Variable , 1979 .
[140] R. Weber. On the Gittins Index for Multiarmed Bandits , 1992 .
[141] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.