暂无分享,去创建一个
Martin J. Wainwright | Banghua Zhu | Jiantao Jiao | Cong Ma | M. Wainwright | Banghua Zhu | Jiantao Jiao | Cong Ma
[1] D. Horvitz,et al. A Generalization of Sampling Without Replacement from a Finite Universe , 1952 .
[2] H. Strasser. Mathematical Theory of Statistics: Statistical Experiments and Asymptotic Decision Theory , 1986 .
[3] A. Nemirovskii,et al. Some Problems on Nonparametric Estimation in Gaussian White Noise , 1987 .
[4] A. Timan. Theory of Approximation of Functions of a Real Variable , 1994 .
[5] Gerhard J. Woeginger,et al. Developments from a June 1996 seminar on Online algorithms: the state of the art , 1998 .
[6] A. Nemirovski,et al. On estimation of the Lr norm of a regression function , 1999 .
[7] G. Imbens,et al. Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score , 2000 .
[8] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..
[9] G. Imbens,et al. Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score , 2002 .
[10] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[11] P. Wawrzynski,et al. Truncated Importance Sampling for Reinforcement Learning with Experience Replay , 2007 .
[12] E. Ionides. Truncated Importance Sampling , 2008 .
[13] Gregory Valiant,et al. A CLT and tight lower bounds for estimating entropy , 2010, Electron. Colloquium Comput. Complex..
[14] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.
[15] T. Cai,et al. Testing composite hypotheses, Hermite polynomials and optimal estimation of a nonsmooth functional , 2011, 1105.3039.
[16] Gregory Valiant,et al. Estimating the unseen: an n/log(n)-sample estimator for entropy and support size, shown optimal via new CLTs , 2011, STOC '11.
[17] Wei Chu,et al. Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms , 2010, WSDM '11.
[18] Joaquin Quiñonero Candela,et al. Counterfactual reasoning and learning systems: the example of computational advertising , 2013, J. Mach. Learn. Res..
[19] Paul Valiant,et al. Estimating the Unseen , 2013, NIPS.
[20] Yanjun Han,et al. Minimax Estimation of Discrete Distributions under ℓ1 Loss , 2014, ArXiv.
[21] Philip S. Thomas,et al. Personalized Ad Recommendation Systems for Life-Time Value Optimization with Guarantees , 2015, IJCAI.
[22] Lihong Li,et al. Toward Minimax Off-policy Value Estimation , 2015, AISTATS.
[23] F. Maxwell Harper,et al. The MovieLens Datasets: History and Context , 2016, TIIS.
[24] Yanjun Han,et al. Minimax Estimation of Functionals of Discrete Distributions , 2014, IEEE Transactions on Information Theory.
[25] Nan Jiang,et al. Doubly Robust Off-policy Value Evaluation for Reinforcement Learning , 2015, ICML.
[26] Yanjun Han,et al. Minimax rate-optimal estimation of KL divergence between discrete distributions , 2016, 2016 International Symposium on Information Theory and Its Applications (ISITA).
[27] Yihong Wu,et al. Minimax Rates of Entropy Estimation on Large Alphabets via Best Polynomial Approximation , 2014, IEEE Transactions on Information Theory.
[28] Philip S. Thomas,et al. Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning , 2016, ICML.
[29] Miroslav Dudík,et al. Optimal and Adaptive Off-policy Evaluation in Contextual Bandits , 2016, ICML.
[30] Ambuj Tewari,et al. From Ads to Interventions: Contextual Bandits in Mobile Health , 2017, Mobile Health - Sensors, Analytic Methods, and Applications.
[31] Sergey Levine,et al. Off-Policy Evaluation via Off-Policy Classification , 2019, NeurIPS.
[32] Martin J. Wainwright,et al. High-Dimensional Statistics , 2019 .
[33] Yihong Wu,et al. Chebyshev polynomials, moment matching, and optimal estimation of the unseen , 2015, The Annals of Statistics.
[34] Yu-Xiang Wang,et al. Asymptotically Efficient Off-Policy Evaluation for Tabular Reinforcement Learning , 2020, AISTATS.
[35] Csaba Szepesvari,et al. Bandit Algorithms , 2020 .
[36] Mengdi Wang,et al. Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation , 2020, ICML.
[37] Nando de Freitas,et al. Hyperparameter Selection for Offline Reinforcement Learning , 2020, ArXiv.