论文信息 - Balancing between Estimated Reward and Uncertainty during News Article Recommendation for ICML 2012 Exploration and Exploitation Challenge

Balancing between Estimated Reward and Uncertainty during News Article Recommendation for ICML 2012 Exploration and Exploitation Challenge

Recommending relevant contents to users automatically in a web service is an important aspect that links with the income of many internet companies. The ICML 2012 Exploration & Exploitation Workshop holds an open challenge that aims at building stateof-the-art news article recommendation system on the Yahoo! platform. We propose an ecient scoring model that recommends the news article with the highest score during each user visit. The scoring model exploits by recommending the article with the highest estimated reward and explores articles with high reward potential by uncertainty measures. Three important aspects, global quality of articles, personal preference of users, and time eects are all considered in the scoring model. Furthermore, during the challenge, we adopt a systemic parameter tuning process to optimize the performance of the model. The tuned scoring model wins the rst place of phase one of the challenge.

Hsuan-Tien Lin | Ku-Chun Chou

[1] Aurélien Garivier,et al. The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond , 2011, COLT.

[2] Shou-De Lin,et al. Novel Models and Ensemble Techniques to Discriminate Favorite Items from Unrated Ones for Personalized Music Recommendation , 2012, KDD Cup.

[3] Wei Chu,et al. Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms , 2010, WSDM '11.

[4] Shou-De Lin,et al. A Linear Ensemble of Individual and Blended Models for Music Rating Prediction , 2012, KDD Cup.

[5] Wei Chu,et al. Contextual Bandits with Linear Payoff Functions , 2011, AISTATS.

[6] Wei Chu,et al. A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[7] Martin Pál,et al. Contextual Multi-Armed Bandits , 2010, AISTATS.

[8] John Langford,et al. The Epoch-Greedy Algorithm for Multi-armed Bandits with Side Information , 2007, NIPS.

[9] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.