A Contextual Bandit Approach to Personalized Online Recommendation via Sparse Interactions

Online recommendation is an important feature in many applications. In practice, the interaction between the users and the recommender system might be sparse, i.e., the users are not always interacting with the recommender system. For example, some users prefer to sweep around the recommendation instead of clicking into the details. Therefore, a response of 0 may not necessarily be a negative response, but a non-response. It comes worse to distinguish these two situations when only one item is recommended to the user each time and few further information is reachable. Most existing recommendation strategies ignore the difference between non-responses and negative responses. In this paper, we propose a novel approach, named SAOR, to make online recommendations via sparse interactions. SAOR uses positive and negative responses to build the user preference model, ignoring all non-responses. Regret analysis of SAOR is provided, experiments on both real and synthetic datasets also show that SAOR outperforms competing methods.