Sequential Resource Allocation in Linear Stochastic Bandits
暂无分享,去创建一个
[1] Alessandro Lazaric,et al. Transfer from Multiple MDPs , 2011, NIPS.
[2] Robert E. Bechhofer,et al. Sequential Identification and Ranking Procedures. , 1968 .
[3] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .
[4] Sébastien Bubeck,et al. Multiple Identifications in Multi-Armed Bandits , 2012, ICML.
[5] W. J. Studden,et al. Theory Of Optimal Experiments , 1972 .
[6] Wei Chu,et al. Contextual Bandits with Linear Payoff Functions , 2011, AISTATS.
[7] Andrea Bonarini,et al. Transfer of samples in batch reinforcement learning , 2008, ICML '08.
[8] Ilja Kuzborskij,et al. Learning by Transferring from Auxiliary Hypotheses , 2014, ArXiv.
[9] Matthew Malloy,et al. lil' UCB : An Optimal Exploration Algorithm for Multi-Armed Bandits , 2013, COLT.
[10] Csaba Szepesvári,et al. Exploration-exploitation tradeoff using variance estimates in multi-armed bandits , 2009, Theor. Comput. Sci..
[11] Aurélien Garivier,et al. On the Complexity of Best-Arm Identification in Multi-Armed Bandit Models , 2014, J. Mach. Learn. Res..
[12] Shuai Li,et al. Online Clustering of Bandits , 2014, ICML.
[13] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[14] Jason Weston,et al. A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.
[15] Alessandro Lazaric,et al. Best-Arm Identification in Linear Bandits , 2014, NIPS.
[16] Varun Grover,et al. Active learning in heteroscedastic noise , 2010, Theor. Comput. Sci..
[17] Nando de Freitas,et al. On correlation and budget constraints in model-based bandit optimization with application to automatic machine learning , 2014, AISTATS.
[18] Selin Damla Ahipasaoglu,et al. Solving ellipsoidal inclusion and optimal experimental design problems: theory and algorithms , 2009 .
[19] Lihong Li,et al. Sample Complexity of Multi-task Reinforcement Learning , 2013, UAI.
[20] Friedrich Pukelsheim,et al. Optimal weights for experimental designs on linearly independent support points , 1991 .
[21] Rémi Munos,et al. From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning , 2014, Found. Trends Mach. Learn..
[22] Stanley Osher,et al. A survey on level set methods for inverse problems and optimal design , 2005, European Journal of Applied Mathematics.
[23] P. Bickel,et al. Regularized estimation of large covariance matrices , 2008, 0803.1909.
[24] R. Agrawal. Sample mean based index policies by O(log n) regret for the multi-armed bandit problem , 1995, Advances in Applied Probability.
[25] F. Pukelsheim. Optimal Design of Experiments (Classics in Applied Mathematics) (Classics in Applied Mathematics, 50) , 2006 .
[26] Michael Jackson,et al. Optimal Design of Experiments , 1994 .
[27] Guillaume Sagnol,et al. Submodularity and Randomized rounding techniques for Optimal Experimental Design , 2010, Electron. Notes Discret. Math..
[28] Marta Soare. Active Learning in Linear Stochastic Bandits , 2013 .
[29] D. Titterington. Optimal design: Some geometrical aspects of D-optimality , 1975 .
[30] Alessandro Lazaric,et al. Sequential Transfer in Multi-armed Bandit with Finite Set of Models , 2013, NIPS.
[31] Koby Crammer,et al. Learning from Multiple Sources , 2006, NIPS.
[32] Antonio Torralba,et al. Transfer Learning by Borrowing Examples for Multiclass Object Detection , 2011, NIPS.
[33] Christoph H. Lampert,et al. Curriculum learning of multiple tasks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[34] Qiang Yang,et al. A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.
[35] Alessandro Lazaric,et al. Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence , 2012, NIPS.
[36] Varun Grover,et al. Active Learning in Multi-armed Bandits , 2008, ALT.
[37] R. Munos,et al. Best Arm Identification in Multi-Armed Bandits , 2010, COLT.
[38] Michèle Sebag,et al. Experimental Design in Dynamical System Identification: A Bandit-Based Active Learning Approach , 2014, ECML/PKDD.
[39] Andrew W. Moore,et al. Hoeffding Races: Accelerating Model Selection Search for Classification and Function Approximation , 1993, NIPS.
[40] Wei Chu,et al. A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.
[41] Oren Somekh,et al. Almost Optimal Exploration in Multi-Armed Bandits , 2013, ICML.
[42] J. Kiefer,et al. The Equivalence of Two Extremum Problems , 1960, Canadian Journal of Mathematics.
[43] Marta Soare. Multi-task Linear Bandits , 2014 .
[44] Yaming Yu. Monotonic convergence of a general algorithm for computing optimal designs , 2009, 0905.2646.
[45] Thomas P. Hayes,et al. Stochastic Linear Optimization under Bandit Feedback , 2008, COLT.
[46] Andreas Krause,et al. Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.
[47] A. U.S.,et al. Sparse Estimation of a Covariance Matrix , 2010 .
[48] R. Munos,et al. Kullback–Leibler upper confidence bounds for optimal sequential allocation , 2012, 1210.1136.
[49] R. Dennis Cook,et al. Heteroscedastic G-optimal Designs , 1993 .
[50] I. Johnstone,et al. ASYMPTOTICALLY OPTIMAL PROCEDURES FOR SEQUENTIAL ADAPTIVE SELECTION OF THE BEST OF SEVERAL NORMAL MEANS , 1982 .
[51] Shie Mannor,et al. Latent Bandits , 2014, ICML.
[52] T. Lai,et al. Least Squares Estimates in Stochastic Regression Models with Applications to Identification and Control of Dynamic Systems , 1982 .
[53] Ambuj Tewari,et al. PAC Subset Selection in Stochastic Multi-armed Bandits , 2012, ICML.
[54] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.
[55] Sébastien Bubeck,et al. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..
[56] Akimichi Takemura,et al. An asymptotically optimal policy for finite support models in the multiarmed bandit problem , 2009, Machine Learning.
[57] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[58] Jinbo Bi,et al. Active learning via transductive experimental design , 2006, ICML.
[59] E. Paulson. A Sequential Procedure for Selecting the Population with the Largest Mean from $k$ Normal Populations , 1964 .
[60] Csaba Szepesvári,et al. Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.
[61] H. Robbins. Some aspects of the sequential design of experiments , 1952 .
[62] J. Merikoski,et al. Inequalities for spreads of matrix sums and products. , 2004 .
[63] Shie Mannor,et al. Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems , 2006, J. Mach. Learn. Res..
[64] D. Wiens,et al. V-optimal designs for heteroscedastic regression , 2014 .
[65] Peter Stone,et al. Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..
[66] Alessandro Lazaric,et al. Multi-Bandit Best Arm Identification , 2011, NIPS.
[67] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..
[68] Rémi Munos,et al. Pure Exploration for Multi-Armed Bandit Problems , 2008, ArXiv.
[69] Rémi Munos,et al. Pure Exploration in Multi-armed Bandits Problems , 2009, ALT.
[70] Guillaume Sagnol,et al. Approximation of a maximum-submodular-coverage problem involving spectral functions, with application to experimental designs , 2010, Discret. Appl. Math..
[71] Shivaram Kalyanakrishnan,et al. Information Complexity in Bandit Subset Selection , 2013, COLT.
[72] Yishay Mansour,et al. Domain Adaptation: Learning Bounds and Algorithms , 2009, COLT.
[73] W. Fuller,et al. Estimation for a Linear Regression Model with Unknown Diagonal Covariance Matrix , 1978 .