暂无分享,去创建一个
Yu-Xiang Wang | Yifei Ma | Balakrishnan Narayanaswamy | Yu-Xiang Wang | Balakrishnan Narayanaswamy | Yifei Ma
[1] J. Robins,et al. Semiparametric Efficiency in Multivariate Regression Models with Missing Data , 1995 .
[2] P. Austin. An Introduction to Propensity Score Methods for Reducing the Effects of Confounding in Observational Studies , 2011, Multivariate behavioral research.
[3] Alexandros Karatzoglou,et al. Session-based Recommendations with Recurrent Neural Networks , 2015, ICLR.
[4] Marc G. Bellemare,et al. Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.
[5] John Langford,et al. Doubly Robust Policy Evaluation and Learning , 2011, ICML.
[6] S. Julious,et al. Confounding and Simpson's paradox , 1994, BMJ.
[7] Charles J. Geyer. 5601 Notes: The Subsampling Bootstrap , 2002 .
[8] Joseph P. Romano,et al. Large Sample Confidence Regions Based on Subsamples under Minimal Assumptions , 1994 .
[9] Joseph Kang,et al. Demystifying Double Robustness: A Comparison of Alternative Strategies for Estimating a Population Mean from Incomplete Data , 2007, 0804.2958.
[10] Martin Wattenberg,et al. Ad click prediction: a view from the trenches , 2013, KDD.
[11] E. H. Simpson,et al. The Interpretation of Interaction in Contingency Tables , 1951 .
[12] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[13] D. Horvitz,et al. A Generalization of Sampling Without Replacement from a Finite Universe , 1952 .
[14] Mark J. van der Laan,et al. Data-adaptive selection of the truncation level for Inverse-Probability-of-Treatment-Weighted estimators , 2008 .
[15] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[16] Joaquin Quiñonero Candela,et al. Counterfactual reasoning and learning systems: the example of computational advertising , 2013, J. Mach. Learn. Res..
[17] Thorsten Joachims,et al. Counterfactual Risk Minimization: Learning from Logged Bandit Feedback , 2015, ICML.
[18] M. de Rijke,et al. Deep Learning with Logged Bandit Feedback , 2018, ICLR.
[19] J. Robins,et al. Estimation of Regression Coefficients When Some Regressors are not Always Observed , 1994 .
[20] Sergey Levine,et al. Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.
[21] J. Robins,et al. Doubly Robust Estimation in Missing Data and Causal Inference Models , 2005, Biometrics.
[22] Lihong Li,et al. Learning from Logged Implicit Exploration Data , 2010, NIPS.
[23] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[24] M. de Rijke,et al. Large-scale Validation of Counterfactual Learning Methods: A Test-Bed , 2016, ArXiv.
[25] Nan Jiang,et al. Doubly Robust Off-policy Value Evaluation for Reinforcement Learning , 2015, ICML.
[26] John Shawe-Taylor,et al. Generalization Performance of Support Vector Machines and Other Pattern Classifiers , 1999 .
[27] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[28] Steffen Rendle,et al. Factorization Machines with libFM , 2012, TIST.
[29] Sébastien Bubeck,et al. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..
[30] Chih-Jen Lin,et al. Field-aware Factorization Machines for CTR Prediction , 2016, RecSys.
[31] Yishay Mansour,et al. Learning Bounds for Importance Weighting , 2010, NIPS.
[32] Philip S. Thomas,et al. Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning , 2016, ICML.