Bandit based Optimization of Multiple Objectives on a Music Streaming Platform

Recommender systems powering online multi-stakeholder platforms often face the challenge of jointly optimizing multiple objectives, in an attempt to efficiently match suppliers and consumers. Examples of such objectives include user behavioral metrics (e.g. clicks, streams, dwell time, etc), supplier exposure objectives (e.g. diversity) and platform centric objectives (e.g. promotions). Jointly optimizing multiple metrics in online recommender systems remains a challenging task. Recent work has demonstrated the prowess of contextual bandits in powering recommendation systems to serve recommendation of interest to users. This paper aims at extending contextual bandits to multi-objective setting so as to power recommendations in a multi-stakeholder platforms. Specifically, in a contextual bandit setting, we learn a recommendation policy that can optimize multiple objectives simultaneously in a fair way. This multi-objective online optimization problem is formalized by using the Generalized Gini index (GGI) aggregation function, which combines and balances multiple objectives together. We propose an online gradient ascent learning algorithm to maximise the long-term vectorial rewards for different objectives scalarised using the GGI function. Through extensive experiments on simulated data and large scale music recommendation data from Spotify, a streaming platform, we show that the proposed algorithm learns a superior policy among the disparate objectives compared with other state-of-the-art approaches.

[1]  Suju Rajan,et al.  Beyond clicks: dwell time for personalization , 2014, RecSys '14.

[2]  Aleksandrs Slivkins,et al.  Contextual Bandits with Similarity Information , 2009, COLT.

[3]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[4]  Ben Carterette,et al.  Recommendations in a marketplace , 2019, RecSys.

[5]  Maksims Volkovs,et al.  Learning to rank with multiple objective functions , 2011, WWW.

[6]  Yiqun Liu,et al.  Fairness-Aware Group Recommendation with Pareto-Efficiency , 2017, RecSys.

[7]  Konstantina Christakopoulou,et al.  Recommendation with Capacity Constraints , 2017, CIKM.

[8]  Cem Tekin,et al.  Multi-objective Contextual Bandit Problem with Similarity Information , 2018, AISTATS.

[9]  Chengjie Sun,et al.  Combined Regression and Tripletwise Learning for Conversion Rate Prediction in Real-Time Bidding Advertising , 2018, SIGIR.

[10]  Gediminas Adomavicius,et al.  New Recommendation Techniques for Multicriteria Rating Systems , 2007, IEEE Intelligent Systems.

[11]  Eric Moulines,et al.  Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning , 2011, NIPS.

[12]  Dietmar Jannach,et al.  When Recurrent Neural Networks meet the Neighborhood for Session-Based Recommendation , 2017, RecSys.

[13]  Wei Chu,et al.  Personalized recommendation on dynamic content using predictive bilinear models , 2009, WWW '09.

[14]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[15]  Dewen Hu,et al.  Multiobjective Reinforcement Learning: A Comprehensive Overview , 2015, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[16]  Deepak Agarwal,et al.  Click shaping to optimize multiple objectives , 2011, KDD.

[17]  Wlodzimierz Ogryczak,et al.  On solving linear programs with the ordered weighted averaging objective , 2003, Eur. J. Oper. Res..

[18]  Wei Chu,et al.  Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms , 2010, WSDM '11.

[19]  Srini Narayanan,et al.  Learning all optimal policies with multiple criteria , 2008, ICML '08.

[20]  Xiaoyu Chen,et al.  Email Volume Optimization at LinkedIn , 2016, KDD.

[21]  Ann Nowé,et al.  Designing multi-objective multi-armed bandits algorithms: A study , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[22]  CurryEdward,et al.  Efficient task assignment for spatial crowdsourcing , 2016 .

[23]  Adriano Veloso,et al.  Pareto-efficient hybridization for multi-objective recommender systems , 2012, RecSys.

[24]  Julie A. Shah,et al.  Fairness in Multi-Agent Sequential Decision-Making , 2014, NIPS.

[25]  Cem Tekin,et al.  Multi-objective Contextual Multi-armed Bandit With a Dominant Objective , 2017, IEEE Transactions on Signal Processing.

[26]  I. Kim,et al.  Adaptive weighted sum method for multiobjective optimization: a new method for Pareto front generation , 2006 .

[27]  Konkoly Thege Multi-criteria Reinforcement Learning , 1998 .

[28]  Ronald R. Yager,et al.  On ordered weighted averaging aggregation operators in multicriteria decisionmaking , 1988, IEEE Trans. Syst. Man Cybern..

[29]  Jun Wang,et al.  Optimizing multiple objectives in collaborative filtering , 2010, RecSys '10.

[30]  Shie Mannor,et al.  Multi-objective Bandits: Optimizing the Generalized Gini Index , 2017, ICML.

[31]  John Langford,et al.  The Epoch-Greedy Algorithm for Multi-armed Bandits with Side Information , 2007, NIPS.

[32]  Alexis Tsoukiàs,et al.  Multicriteria User Modeling in Recommender Systems , 2011, IEEE Intelligent Systems.

[33]  Bernard Manderick,et al.  Knowledge Gradient for Multi-objective Multi-armed Bandit Algorithms , 2014, ICAART.

[34]  R. Snee,et al.  Ridge Regression in Practice , 1975 .

[35]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[36]  John A. Weymark,et al.  GENERALIZED GIN 1 INEQUALITY INDICES , 2001 .

[37]  Deepak Agarwal,et al.  Personalized click shaping through lagrangian duality for online recommendation , 2012, SIGIR '12.

[38]  Fernando Diaz,et al.  Towards a Fair Marketplace: Counterfactual Evaluation of the trade-off between Relevance, Fairness & Satisfaction in Recommendation Systems , 2018, CIKM.

[39]  H. Robbins A Stochastic Approximation Method , 1951 .

[40]  Anísio Lacerda,et al.  Contextual Bandits for Multi-objective Recommender Systems , 2015, 2015 Brazilian Conference on Intelligent Systems (BRACIS).

[41]  Bernard Manderick,et al.  Thompson Sampling in the Adaptive Linear Scalarized Multi Objective Multi Armed Bandit , 2015, ICAART.

[42]  Edward Curry,et al.  Efficient task assignment for spatial crowdsourcing: A combinatorial fractional optimization approach with semi-bandit learning , 2016, Expert Syst. Appl..

[43]  Roberto Turrin,et al.  Performance of recommender algorithms on top-n recommendation tasks , 2010, RecSys '10.

[44]  James McInerney,et al.  Explore, exploit, and explain: personalizing explainable recommendations with bandits , 2018, RecSys.