暂无分享,去创建一个
[1] Yu Bai,et al. Near Optimal Provable Uniform Convergence in Off-Policy Evaluation for Reinforcement Learning , 2021, AISTATS.
[2] Csaba Szepesvári,et al. Finite-Time Bounds for Fitted Value Iteration , 2008, J. Mach. Learn. Res..
[3] Aleksandrs Slivkins,et al. Bandits with Knapsacks , 2013, 2013 IEEE 54th Annual Symposium on Foundations of Computer Science.
[4] Marie Davidian,et al. Robust estimation of optimal dynamic treatment regimes for sequential treatment decisions. , 2013, Biometrika.
[5] Daniel R. Jiang,et al. Lookahead-Bounded Q-Learning , 2020, ICML.
[6] Stefan Wager,et al. Efficient Policy Learning , 2017, ArXiv.
[7] Yuval Emek,et al. Stateful Posted Pricing with Vanishing Regret via Dynamic Deterministic Markov Decision Processes , 2020, NeurIPS.
[8] Omar Besbes,et al. Blind Network Revenue Management , 2011, Oper. Res..
[9] Virag Shah,et al. Semi-parametric dynamic contextual pricing , 2019, NeurIPS.
[10] Xinkun Nie,et al. Learning When-to-Treat Policies , 2019, Journal of the American Statistical Association.
[11] Sivaraman Balakrishnan,et al. Semiparametric Counterfactual Density Estimation , 2021, Biometrika.
[12] Thorsten Joachims,et al. The Self-Normalized Estimator for Counterfactual Learning , 2015, NIPS.
[13] Rémi Munos,et al. Error Bounds for Approximate Policy Iteration , 2003, ICML.
[14] Masatoshi Uehara,et al. Fast Rates for the Regret of Offline Reinforcement Learning , 2021, COLT.
[15] Richard L. Tweedie,et al. Markov Chains and Stochastic Stability , 1993, Communications and Control Engineering Series.
[16] W. Marsden. I and J , 2012 .
[17] Nikhil R. Devanur,et al. An efficient algorithm for contextual bandits with knapsacks, and an extension to concave objectives , 2015, COLT.
[18] G. A. Young,et al. High‐dimensional Statistics: A Non‐asymptotic Viewpoint, Martin J.Wainwright, Cambridge University Press, 2019, xvii 552 pages, £57.99, hardback ISBN: 978‐1‐1084‐9802‐9 , 2020, International Statistical Review.
[19] Nikos Vlassis,et al. More Efficient Off-Policy Evaluation through Regularized Targeted Learning , 2019, ICML.
[20] D. Simchi-Levi,et al. A Statistical Learning Approach to Personalization in Revenue Management , 2015, Manag. Sci..
[21] J. Robins,et al. Double/Debiased Machine Learning for Treatment and Structural Parameters , 2017 .
[22] Gaël Varoquaux,et al. Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..
[23] Masatoshi Uehara,et al. Efficiently Breaking the Curse of Horizon in Off-Policy Evaluation with Double Reinforcement Learning , 2019 .
[24] Renato Paes Leme,et al. Feature-based Dynamic Pricing , 2016, EC.
[25] Ameet Talwalkar,et al. Foundations of Machine Learning , 2012, Adaptive computation and machine learning.
[26] Huang Bojun. Steady State Analysis of Episodic Reinforcement Learning , 2020, NeurIPS 2020.
[27] Donglin Zeng,et al. New Statistical Learning Methods for Estimating Optimal Dynamic Treatment Regimes , 2015, Journal of the American Statistical Association.
[28] Mohsen Bayati,et al. Dynamic Pricing with Demand Covariates , 2016, 1604.07463.
[29] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.
[30] Zhengyuan Zhou,et al. Offline Multi-Action Policy Learning: Generalization and Optimization , 2018, Oper. Res..
[31] Bruno Scherrer,et al. Approximate Policy Iteration Schemes: A Comparison , 2014, ICML.
[32] Garrett J. van Ryzin,et al. A Multiproduct Dynamic Pricing Problem and Its Applications to Network Yield Management , 1997, Oper. Res..
[33] Nathan Kallus,et al. Minimax-Optimal Policy Learning Under Unobserved Confounding , 2020, Manag. Sci..
[34] Yuhong Yang,et al. Information-theoretic determination of minimax rates of convergence , 1999 .
[35] He Wang,et al. A Re-Solving Heuristic with Uniformly Bounded Loss for Network Revenue Management , 2018, Manag. Sci..
[36] N. B. Keskin,et al. Personalized Dynamic Pricing with Machine Learning: High Dimensional Features and Heterogeneous Elasticity , 2020 .
[37] Adel Javanmard,et al. Dynamic Pricing in High-Dimensions , 2016, J. Mach. Learn. Res..
[38] Yisong Yue,et al. Batch Policy Learning under Constraints , 2019, ICML.
[39] N. Bora Keskin,et al. Personalized Dynamic Pricing with Machine Learning: High-Dimensional Features and Heterogeneous Elasticity , 2021, Manag. Sci..
[40] Robert L. Bray. The Multisecretary Problem with Continuous Valuations , 2019 .
[41] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.
[42] Shipra Agrawal,et al. Learning in Structured MDPs with Convex Cost Functions: Improved Regret Bounds for Inventory Management , 2019, EC.
[43] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[44] Mengdi Wang,et al. Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation , 2020, ICML.
[45] James B. Orlin,et al. Adaptive Data-Driven Inventory Control with Censored Demand Based on Kaplan-Meier Estimator , 2011, Oper. Res..
[46] Yifei Ma,et al. Optimal Off-Policy Evaluation for Reinforcement Learning with Marginalized Importance Sampling , 2019, NeurIPS.
[47] G. Gallego,et al. Revenue Management and Pricing Analytics , 2019, International Series in Operations Research & Management Science.
[48] Qiang Liu,et al. Doubly Robust Bias Reduction in Infinite Horizon Off-Policy Estimation , 2019, ICLR.
[49] Csaba Szepesvari,et al. Regularized least-squares regression: Learning from a β-mixing sequence , 2012 .
[50] Nan Jiang,et al. Information-Theoretic Considerations in Batch Reinforcement Learning , 2019, ICML.
[51] Sean P. Meyn. Control Techniques for Complex Networks: Workload , 2007 .
[52] D. Pollard. Empirical Processes: Theory and Applications , 1990 .
[53] J. Cima,et al. On weak* convergence in ¹ , 1996 .
[54] Roman Vershynin,et al. High-Dimensional Probability , 2018 .
[55] John Langford,et al. Doubly Robust Policy Evaluation and Optimization , 2014, ArXiv.
[56] Csaba Szepesvári,et al. Fitted Q-iteration in continuous action-space MDPs , 2007, NIPS.
[57] Panos M. Pardalos,et al. Approximate dynamic programming: solving the curses of dimensionality , 2009, Optim. Methods Softw..
[58] Masatoshi Uehara,et al. Double Reinforcement Learning for Efficient Off-Policy Evaluation in Markov Decision Processes , 2019, J. Mach. Learn. Res..
[59] Rémi Munos,et al. Performance Bounds in Lp-norm for Approximate Value Iteration , 2007, SIAM J. Control. Optim..
[60] Philip S. Thomas,et al. Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning , 2016, ICML.
[61] Nan Jiang,et al. Doubly Robust Off-policy Value Evaluation for Reinforcement Learning , 2015, ICML.
[62] Kostas Bimpikis,et al. Spatial pricing in ride-sharing networks , 2016, NetEcon@EC.