暂无分享,去创建一个
Bo Dai | Dale Schuurmans | Ruiyi Zhang | Lihong Li | Lihong Li | D. Schuurmans | Ruiyi Zhang | Bo Dai
[1] Andrew Gelman,et al. Handbook of Markov Chain Monte Carlo , 2011 .
[2] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[3] Stefano Ermon,et al. A-NICE-MC: Adversarial Training for MCMC , 2017, NIPS.
[4] Bernhard Schölkopf,et al. A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..
[5] Guoyin Wang,et al. Adversarial Learning of a Sampler Based on an Unnormalized Distribution , 2019, AISTATS.
[6] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[7] Steffen Bickel,et al. Discriminative learning for differing training and test distributions , 2007, ICML '07.
[8] Dudley,et al. Real Analysis and Probability: Measurability: Borel Isomorphism and Analytic Sets , 2002 .
[9] Motoaki Kawanabe,et al. Direct Importance Estimation with Model Selection and Its Application to Covariate Shift Adaptation , 2007, NIPS.
[10] Sanjoy Dasgupta,et al. Off-Policy Temporal Difference Learning with Function Approximation , 2001, ICML.
[11] Miroslav Dudík,et al. Optimal and Adaptive Off-policy Evaluation in Contextual Bandits , 2016, ICML.
[12] Yiming Yang,et al. MMD GAN: Towards Deeper Understanding of Moment Matching Network , 2017, NIPS.
[13] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[14] Mingrui Liu,et al. Solving Weakly-Convex-Weakly-Concave Saddle-Point Problems as Weakly-Monotone Variational Inequality , 2018 .
[15] Sebastian Nowozin,et al. f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization , 2016, NIPS.
[16] Lihong Li,et al. Learning from Logged Implicit Exploration Data , 2010, NIPS.
[17] Carl M. Harris,et al. Fundamentals of queueing theory , 1975 .
[18] Rajeev Motwani,et al. The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.
[19] Shai Ben-David,et al. Understanding Machine Learning: From Theory to Algorithms , 2014 .
[20] Philip S. Thomas,et al. Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning , 2016, ICML.
[21] Qiang Liu,et al. Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation , 2018, NeurIPS.
[22] Le Song,et al. Learning from Conditional Distributions via Dual Embeddings , 2016, AISTATS.
[23] Mario Lucic,et al. Are GANs Created Equal? A Large-Scale Study , 2017, NeurIPS.
[24] Michael I. Jordan,et al. On Gradient Descent Ascent for Nonconvex-Concave Minimax Problems , 2019, ICML.
[25] Nando de Freitas,et al. An Introduction to MCMC for Machine Learning , 2004, Machine Learning.
[26] Alborz Geramifard,et al. Dyna-Style Planning with Linear Function Approximation and Prioritized Sweeping , 2008, UAI.
[27] Philip S. Thomas,et al. Using Options and Covariance Testing for Long Horizon Off-Policy Policy Evaluation , 2017, NIPS.
[28] John Langford,et al. Doubly Robust Policy Evaluation and Learning , 2011, ICML.
[29] Csaba Szepesvári,et al. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path , 2006, Machine Learning.
[30] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[31] Nan Jiang,et al. Doubly Robust Off-policy Value Evaluation for Reinforcement Learning , 2015, ICML.
[32] Dilin Wang,et al. Learning to Draw Samples: With Application to Amortized MLE for Generative Adversarial Learning , 2016, ArXiv.
[33] Alexander Shapiro,et al. Stochastic Approximation approach to Stochastic Programming , 2013 .
[34] David Haussler,et al. Sphere Packing Numbers for Subsets of the Boolean n-Cube with Bounded Vapnik-Chervonenkis Dimension , 1995, J. Comb. Theory, Ser. A.
[35] D. Pollard. Convergence of stochastic processes , 1984 .
[36] A. Müller. Integral Probability Metrics and Their Generating Classes of Functions , 1997, Advances in Applied Probability.
[37] Alexander Shapiro,et al. Lectures on Stochastic Programming: Modeling and Theory , 2009 .
[38] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[39] Aaron C. Courville,et al. Improved Training of Wasserstein GANs , 2017, NIPS.
[40] Takafumi Kanamori,et al. A Least-squares Approach to Direct Importance Estimation , 2009, J. Mach. Learn. Res..
[41] Karsten M. Borgwardt,et al. Covariate Shift by Kernel Mean Matching , 2009, NIPS 2009.
[42] John K Kruschke,et al. Bayesian data analysis. , 2010, Wiley interdisciplinary reviews. Cognitive science.
[43] Neil D. Lawrence,et al. Dataset Shift in Machine Learning , 2009 .
[44] Carl D. Meyer,et al. Deeper Inside PageRank , 2004, Internet Math..
[45] J M Robins,et al. Marginal Mean Models for Dynamic Regimes , 2001, Journal of the American Statistical Association.
[46] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[47] Alessandro Lazaric,et al. Finite-sample analysis of least-squares policy iteration , 2012, J. Mach. Learn. Res..
[48] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[49] David G. Luenberger,et al. Linear and nonlinear programming , 1984 .
[50] Le Song,et al. SBEED: Convergent Reinforcement Learning with Nonlinear Function Approximation , 2017, ICML.
[51] Martin J. Wainwright,et al. Estimating divergence functionals and the likelihood ratio by penalized convex risk minimization , 2007, NIPS.
[52] Michael I. Jordan,et al. Minmax Optimization: Stable Limit Points of Gradient Descent Ascent are Locally Optimal , 2019, ArXiv.
[53] Bo Dai,et al. DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections , 2019, NeurIPS.
[54] Kenji Fukumizu,et al. On integral probability metrics, φ-divergences and binary classification , 2009, 0901.2698.
[55] Mingrui Liu,et al. Solving Weakly-Convex-Weakly-Concave Saddle-Point Problems as Successive Strongly Monotone Variational Inequalities , 2018 .
[56] Guanghui Lan,et al. Algorithms for stochastic optimization with expectation constraints , 2016, 1604.03887.
[57] Joaquin Quiñonero Candela,et al. Counterfactual reasoning and learning systems: the example of computational advertising , 2013, J. Mach. Learn. Res..
[58] Michael I. Jordan,et al. What is Local Optimality in Nonconvex-Nonconcave Minimax Optimization? , 2019, ICML.
[59] Marc G. Bellemare,et al. Off-Policy Deep Reinforcement Learning by Bootstrapping the Covariate Shift , 2019, AAAI.
[60] Richard L. Tweedie,et al. Markov Chains and Stochastic Stability , 1993, Communications and Control Engineering Series.
[61] Francis R. Bach,et al. Breaking the Curse of Dimensionality with Convex Neural Networks , 2014, J. Mach. Learn. Res..