When People Change their Mind: Off-Policy Evaluation in Non-stationary Recommendation Environments
暂无分享,去创建一个
[1] Tsvi Kuflik,et al. Workshop on information heterogeneity and fusion in recommender systems (HetRec 2010) , 2010, RecSys '10.
[2] Maarten de Rijke,et al. OpenSearch: Lessons Learned from an Online Evaluation Campaign , 2018, ACM J. Data Inf. Qual..
[3] Tao Ye,et al. Modeling Musical Taste Evolution with Recurrent Neural Networks , 2018, ArXiv.
[4] Anmol Bhasin,et al. From Infrastructure to Culture: A/B Testing Challenges in Large Scale Social Networks , 2015, KDD.
[5] Fang Liu,et al. A Change-Detection based Framework for Piecewise-stationary Multi-Armed Bandit Problem , 2017, AAAI.
[6] R. Tourangeau. Context Effects on Responses to Attitude Questions: Attitudes as Memory Structures , 1992 .
[7] Wei Chu,et al. Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms , 2010, WSDM '11.
[8] Eric Moulines,et al. On Upper-Confidence Bound Policies for Switching Bandit Problems , 2011, ALT.
[9] Katja Hofmann,et al. Reusing historical interaction data for faster online learning to rank for IR , 2013, DIR.
[10] Claudio Gentile,et al. A Gang of Bandits , 2013, NIPS.
[11] Fernando Diaz,et al. Integration of news content into web results , 2009, WSDM '09.
[12] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[13] Thorsten Joachims,et al. Taste Over Time: The Temporal Dynamics of User Preferences , 2013, ISMIR.
[14] Fabio A. González,et al. Performance of Recommendation Systems in Dynamic Streaming Environments , 2007, SDM.
[15] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[16] Susan T. Dumais,et al. Short-Term Satisfaction and Long-Term Coverage: Understanding How Users Tolerate Algorithmic Exploration , 2018, WSDM.
[17] Omar Besbes,et al. Stochastic Multi-Armed-Bandit Problem with Non-stationary Rewards , 2014, NIPS.
[18] John Langford,et al. Doubly Robust Policy Evaluation and Learning , 2011, ICML.
[19] Philip S. Thomas,et al. High-Confidence Off-Policy Evaluation , 2015, AAAI.
[20] P. Whittle. Restless Bandits: Activity Allocation in a Changing World , 1988 .
[21] D. Horvitz,et al. A Generalization of Sampling Without Replacement from a Finite Universe , 1952 .
[22] John Langford,et al. Off-policy evaluation for slate recommendation , 2016, NIPS.
[23] Wei Chu,et al. A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.
[24] Shie Mannor,et al. Piecewise-stationary bandit problems with side observations , 2009, ICML '09.
[25] Qingyun Wu,et al. Learning Contextual Bandits in a Non-stationary Environment , 2018, SIGIR.
[26] Joel B. Cohen,et al. The social animal. , 1973 .
[27] João Gama,et al. On analyzing user preference dynamics with temporal social networks , 2018, Machine Learning.
[28] Miroslav Dudík,et al. Optimal and Adaptive Off-policy Evaluation in Contextual Bandits , 2016, ICML.
[29] Jiahui Liu,et al. Personalized news recommendation based on click behavior , 2010, IUI '10.
[30] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[31] Philip S. Thomas,et al. Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning , 2016, ICML.
[32] M. de Rijke,et al. Online Exploration for Detecting Shifts in Fresh Intent , 2014, CIKM.
[33] Susan T. Dumais,et al. Understanding temporal query dynamics , 2011, WSDM '11.
[34] Susan T. Dumais,et al. Modeling and predicting behavioral dynamics on the web , 2012, WWW.
[35] F. Strack,et al. Context Effects in Attitude Surveys: Applying Cognitive Theory to Social Research , 1991 .
[36] Philip S. Thomas,et al. Predictive Off-Policy Policy Evaluation for Nonstationary Decision Problems, with Applications to Digital Marketing , 2017, AAAI.
[37] Chris P. Tsokos,et al. Mathematical Statistics with Applications , 2009 .
[38] John Langford,et al. Sample-efficient Nonstationary Policy Evaluation for Contextual Bandits , 2012, UAI.
[39] Thorsten Joachims,et al. Optimizing search engines using clickthrough data , 2002, KDD.
[40] Modeling of Holiday Effects and Seasonality in Daily Time Series , 2018 .
[41] J. Robins,et al. Doubly Robust Estimation in Missing Data and Causal Inference Models , 2005, Biometrics.
[42] A. Tversky,et al. Judgment under Uncertainty: Heuristics and Biases , 1974, Science.