Towards Off-Policy Learning for Ranking Policies with Logged Feedback
暂无分享,去创建一个
[1] Donglin Wang,et al. Learning How to Propagate Messages in Graph Neural Networks , 2021, KDD.
[2] Teng Xiao,et al. A General Offline Reinforcement Learning Framework for Interactive Recommendation , 2021, AAAI.
[3] Balázs Hidasi,et al. Recurrent neural networks , 2013, Scholarpedia.
[4] Weinan Zhang,et al. Interactive Recommender System via Knowledge Graph-enhanced Reinforcement Learning , 2020, SIGIR.
[5] Joemon M. Jose,et al. Self-Supervised Reinforcement Learning for Recommender Systems , 2020, SIGIR.
[6] S. Levine,et al. Conservative Q-Learning for Offline Reinforcement Learning , 2020, NeurIPS.
[7] S. Levine,et al. Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems , 2020, ArXiv.
[8] Dawei Yin,et al. Pseudo Dyna-Q: A Reinforcement Learning Framework for Interactive Recommendation , 2020, WSDM.
[9] Craig Boutilier,et al. RecSim: A Configurable Simulation Platform for Recommender Systems , 2019, ArXiv.
[10] Zaiqiao Meng,et al. Hierarchical Neural Variational Model for Personalized Sequential Recommendation , 2019, WWW.
[11] Doina Precup,et al. Off-Policy Deep Reinforcement Learning without Exploration , 2018, ICML.
[12] Ed H. Chi,et al. Top-K Off-Policy Correction for a REINFORCE Recommender System , 2018, WSDM.
[13] Julian J. McAuley,et al. Self-Attentive Sequential Recommendation , 2018, 2018 IEEE International Conference on Data Mining (ICDM).
[14] Liang Zhang,et al. Deep reinforcement learning for page-wise recommendations , 2018, RecSys.
[15] Nicholas Jing Yuan,et al. DRN: A Deep Reinforcement Learning Framework for News Recommendation , 2018, WWW.
[16] Liang Zhang,et al. Recommendations with Negative Feedback via Pairwise Deep Reinforcement Learning , 2018, KDD.
[17] Matthew D. Hoffman,et al. Variational Autoencoders for Collaborative Filtering , 2018, WWW.
[18] Ke Wang,et al. Personalized Top-N Sequential Recommendation via Convolutional Sequence Embedding , 2018, WSDM.
[19] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[20] Sergey Levine,et al. Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.
[21] Marc G. Bellemare,et al. Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.
[22] David M. Blei,et al. Variational Inference: A Review for Statisticians , 2016, ArXiv.
[23] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.
[24] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[25] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[26] Ben Taskar,et al. Posterior Regularization for Structured Latent Variable Models , 2010, J. Mach. Learn. Res..
[27] Lars Schmidt-Thieme,et al. BPR: Bayesian Personalized Ranking from Implicit Feedback , 2009, UAI.
[28] Yifan Hu,et al. Collaborative Filtering for Implicit Feedback Datasets , 2008, 2008 Eighth IEEE International Conference on Data Mining.
[29] Tie-Yan Liu,et al. Listwise approach to learning to rank: theory and algorithm , 2008, ICML '08.
[30] Tie-Yan Liu,et al. Learning to rank: from pairwise approach to listwise approach , 2007, ICML '07.
[31] T. Heskes,et al. Expectation propagation for approximate inference in dynamic bayesian networks , 2002, UAI 2002.
[32] Geoffrey E. Hinton,et al. A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.
[33] S. Robertson. The probability ranking principle in IR , 1997 .
[34] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[35] G. Crooks. On Measures of Entropy and Information , 2015 .