Modeling implicit feedback based on bandit learning for recommendation

Abstract Implicit feedback such as clicks and favorites has been widely studied and applied to recommender systems due to its low collection cost and rich hidden information. In this paper, the recommendation based on implicit feedback of multiple behaviors is formalized into a multi-armed bandit (MAB) problem, and an online recommendation model based on MAB is proposed. In the model, we use item categories as arms, rather than using items as arms in existing related models, so that the number of arms can be fixed and the scale can be controlled, thereby avoiding computational complexity. Such a design can also increase the diversity of recommendations. In addition, we divide the implicit feedback into strong interaction, weak interaction, and non-interaction to calculate user preferences more accurately. Almost all recommendation models face two important challenges, namely the cold start problem and the exploration & exploitation (EE) problem. In our model, a differentiated recommendation strategy is put forward to alleviate the cold start problem and a bandit learning algorithm based on Thompson sampling is proposed to balance the EE problem by making the expected reward of each arm subject to an independent beta distribution and using multi-behavior implicit feedback to update the posterior distribution. We verified the effectiveness of the proposed model on three public datasets, and discussed the factors that affect the model, as well as its robustness in the cold start environment. Index Terms—Implicit feedback, multi-armed bandit, recommender systems, Thompson sampling.