暂无分享,去创建一个
Paul Mineiro | Nikos Karampatziakis | Weizhu Chen | Sebastian Kochman | Jade Huang | Kathy Osborne | Nikos Karampatziakis | Paul Mineiro | Weizhu Chen | Jade Huang | Sebastian Kochman | Kathy Osborne
[1] John Langford,et al. Residual Loss Prediction: Reinforcement Learning With No Incremental Feedback , 2018, ICLR.
[2] J. Langford,et al. The Epoch-Greedy algorithm for contextual multi-armed bandits , 2007, NIPS 2007.
[3] John Langford,et al. Making Contextual Decisions with Low Technical Debt , 2016 .
[4] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[5] Yelong Shen,et al. FusionNet: Fusing via Fully-Aware Attention with Application to Machine Comprehension , 2017, ICLR.
[6] Marc G. Bellemare,et al. Dopamine: A Research Framework for Deep Reinforcement Learning , 2018, ArXiv.
[7] Lihong Li,et al. Counterfactual Estimation and Optimization of Click Metrics in Search Engines: A Case Study , 2015, WWW.
[8] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[9] Ion Stoica,et al. Ray RLLib: A Composable and Scalable Reinforcement Learning Library , 2017, NIPS 2017.
[10] Xiaohui Ye,et al. Horizon: Facebook's Open Source Applied Reinforcement Learning Platform , 2018, ArXiv.
[11] Ed H. Chi,et al. Top-K Off-Policy Correction for a REINFORCE Recommender System , 2018, WSDM.
[12] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[13] Wei Chu,et al. A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.
[14] Thorsten Joachims,et al. Recommendations as Treatments: Debiasing Learning and Evaluation , 2016, ICML.