暂无分享,去创建一个
Xiaohui Ye | Yitao Liang | Vivek Narayanan | Yuchen He | Zachary Kaden | Jason Gauci | Kittipat Virochsiri | Edoardo Conti | Edoardo Conti | J. Gauci | Yitao Liang | Scott Fujimoto | Yuchen He | Kittipat Virochsiri | Zachary Kaden | Vivek Narayanan | Xiaohui Ye
[1] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[2] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[3] Jürgen Schmidhuber,et al. World Models , 2018, ArXiv.
[4] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.
[5] John Langford,et al. Doubly Robust Policy Evaluation and Learning , 2011, ICML.
[6] Sergey Levine,et al. Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).
[7] S. Srihari. Mixture Density Networks , 1994 .
[8] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[9] Tom Schaul,et al. Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.
[10] Shane Legg,et al. Noisy Networks for Exploration , 2017, ICLR.
[11] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..
[12] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[13] Philip S. Thomas,et al. Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning , 2016, ICML.
[14] Richard Evans,et al. Deep Reinforcement Learning in Large Discrete Action Spaces , 2015, 1512.07679.
[15] Carl E. Rasmussen,et al. PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.
[16] Tom Schaul,et al. Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.
[17] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[18] Ion Stoica,et al. Ray RLLib: A Composable and Scalable Reinforcement Learning Library , 2017, NIPS 2017.
[19] Doina Precup,et al. Off-Policy Deep Reinforcement Learning without Exploration , 2018, ICML.
[20] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[21] Pat Langley,et al. Crafting Papers on Machine Learning , 2000, ICML.
[22] Ed H. Chi,et al. Top-K Off-Policy Correction for a REINFORCE Recommender System , 2018, WSDM.
[23] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[24] Joaquin Quiñonero Candela,et al. Counterfactual reasoning and learning systems: the example of computational advertising , 2013, J. Mach. Learn. Res..
[25] Samy Bengio,et al. Device Placement Optimization with Reinforcement Learning , 2017, ICML.
[26] D. Horvitz,et al. A Generalization of Sampling Without Replacement from a Finite Universe , 1952 .
[27] Sergey Levine,et al. Deep visual foresight for planning robot motion , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).
[28] R. Bellman. A Markovian Decision Process , 1957 .
[29] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[30] Nan Jiang,et al. Doubly Robust Off-policy Value Evaluation for Reinforcement Learning , 2015, ICML.
[31] Martin A. Riedmiller,et al. Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images , 2015, NIPS.
[32] Scott Shenker,et al. Spark: Cluster Computing with Working Sets , 2010, HotCloud.
[33] Miroslav Dudík,et al. Optimal and Adaptive Off-policy Evaluation in Contextual Bandits , 2016, ICML.
[34] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.
[35] John Langford,et al. Making Contextual Decisions with Low Technical Debt , 2016 .
[36] Liang Zhang,et al. Deep reinforcement learning for page-wise recommendations , 2018, RecSys.
[37] Gediminas Adomavicius,et al. Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions , 2005, IEEE Transactions on Knowledge and Data Engineering.
[38] Liang Zhang,et al. Recommendations with Negative Feedback via Pairwise Deep Reinforcement Learning , 2018, KDD.
[39] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[40] Nicholas Jing Yuan,et al. DRN: A Deep Reinforcement Learning Framework for News Recommendation , 2018, WWW.
[41] John Langford,et al. The Epoch-Greedy Algorithm for Multi-armed Bandits with Side Information , 2007, NIPS.
[42] Georg Heigold,et al. A Gaussian Mixture Model layer jointly optimized with discriminative features within a Deep Neural Network architecture , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).