Safe Exploration for Optimizing Contextual Bandits
暂无分享,去创建一个
[1] Wei Chu,et al. Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms , 2010, WSDM '11.
[2] Thorsten Joachims,et al. Unbiased Learning-to-Rank with Biased Feedback , 2016, WSDM.
[3] M. de Rijke,et al. Deep Learning with Logged Bandit Feedback , 2018, ICLR.
[4] Thorsten Joachims,et al. Batch learning from logged bandit feedback through counterfactual risk minimization , 2015, J. Mach. Learn. Res..
[5] Lihong Li,et al. Learning from Logged Implicit Exploration Data , 2010, NIPS.
[6] Yifan Wu,et al. Conservative Bandits , 2016, ICML.
[7] Michèle Sebag,et al. Exploration vs Exploitation vs Safety: Risk-Aware Multi-Armed Bandits , 2013, ACML.
[8] M. de Rijke,et al. Balancing Speed and Quality in Online Learning to Rank for Information Retrieval , 2017, CIKM.
[9] Yiming Yang,et al. RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..
[10] Ashish Kapoor,et al. Risk-Aware Algorithms for Adversarial Contextual Bandits , 2016, ArXiv.
[11] Jaana Kekäläinen,et al. Cumulated gain-based evaluation of IR techniques , 2002, TOIS.
[12] Filip Radlinski,et al. How does clickthrough data reflect retrieval quality? , 2008, CIKM '08.
[13] Jiafeng Guo,et al. Reinforcement Learning to Rank with Markov Decision Process , 2017, SIGIR.
[14] Ken Lang,et al. NewsWeeder: Learning to Filter Netnews , 1995, ICML.
[15] M. de Rijke,et al. Bayesian Ranker Comparison Based on Historical User Interactions , 2015, SIGIR.
[16] M. de Rijke,et al. To Model or to Intervene: A Comparison of Counterfactual and Online Learning to Rank from User Interactions , 2019, SIGIR.
[17] Katja Hofmann,et al. Balancing Exploration and Exploitation in Learning to Rank Online , 2011, ECIR.
[18] M. de Rijke,et al. An Introduction to Click Models for Web Search: SIGIR 2015 Tutorial , 2015, SIGIR.
[19] M. de Rijke,et al. Modeling clicks beyond the first result page , 2013, CIKM.
[20] JoachimsThorsten,et al. Batch learning from logged bandit feedback through counterfactual risk minimization , 2015 .
[21] Jonathan J. Hull,et al. A Database for Handwritten Text Recognition Research , 1994, IEEE Trans. Pattern Anal. Mach. Intell..
[22] M. de Rijke,et al. Online Learning to Rank for Information Retrieval: SIGIR 2016 Tutorial , 2016, SIGIR.
[23] Chris Mesterharm,et al. Experience-efficient learning in associative bandit problems , 2006, ICML.
[24] Benjamin Van Roy,et al. Conservative Contextual Linear Bandits , 2016, NIPS.
[25] Maarten de Rijke,et al. Probabilistic Multileave Gradient Descent , 2016, ECIR.
[26] Thorsten Joachims,et al. Optimizing search engines using clickthrough data , 2002, KDD.
[27] Ambuj Tewari,et al. Efficient bandit algorithms for online multiclass prediction , 2008, ICML '08.
[28] Qiang Wu,et al. Adapting boosting for information retrieval measures , 2010, Information Retrieval.
[29] John Langford,et al. Exploration scavenging , 2008, ICML '08.
[30] Thorsten Joachims,et al. Interactively optimizing information retrieval systems as a dueling bandits problem , 2009, ICML '09.
[31] M. de Rijke,et al. Click Models for Web Search , 2015, Click Models for Web Search.
[32] Dorota Glowacka,et al. Bandit Algorithms in Information Retrieval , 2019, Found. Trends Inf. Retr..
[33] Thorsten Joachims,et al. Accurately interpreting clickthrough data as implicit feedback , 2005, SIGIR '05.
[34] John Langford,et al. Off-policy evaluation for slate recommendation , 2016, NIPS.
[35] Wei Chu,et al. A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.
[36] Claudio Gentile,et al. Boltzmann Exploration Done Right , 2017, NIPS.
[37] M. de Rijke,et al. BubbleRank: Safe Online Learning to Re-Rank via Implicit Click Feedback , 2018, UAI.
[38] Marc Najork,et al. Position Bias Estimation for Unbiased Learning to Rank in Personal Search , 2018, WSDM.
[39] Tao Qin,et al. Introducing LETOR 4.0 Datasets , 2013, ArXiv.
[40] Shipra Agrawal,et al. Thompson Sampling for Contextual Bandits with Linear Payoffs , 2012, ICML.
[41] Thorsten Joachims,et al. Counterfactual Risk Minimization: Learning from Logged Bandit Feedback , 2015, ICML.
[42] John Langford,et al. The offset tree for learning with partial labels , 2008, KDD.
[43] Fabrizio Silvestri,et al. Post-Learning Optimization of Tree Ensembles for Efficient Ranking , 2016, SIGIR.
[44] Feng Fu,et al. Risk-aware multi-armed bandit problem with application to portfolio selection , 2017, Royal Society Open Science.
[45] Philip S. Thomas,et al. High-Confidence Off-Policy Evaluation , 2015, AAAI.
[46] Yi Chang,et al. Yahoo! Learning to Rank Challenge Overview , 2010, Yahoo! Learning to Rank Challenge.
[47] Katja Hofmann,et al. Fidelity, Soundness, and Efficiency of Interleaved Comparison Methods , 2013, TOIS.
[48] Katja Hofmann,et al. Contextual Bandits for Information Retrieval , 2011 .
[49] Gediminas Adomavicius,et al. Incorporating contextual information in recommender systems using a multidimensional approach , 2005, TOIS.
[50] Javier García,et al. Safe Exploration of State and Action Spaces in Reinforcement Learning , 2012, J. Artif. Intell. Res..
[51] Leslie Pack Kaelbling,et al. Associative Reinforcement Learning: Functions in k-DNF , 1994, Machine Learning.