Stochastic Low-Rank Bandits

Many problems in computer vision and recommender systems involve low-rank matrices. In this work, we study the problem of finding the maximum entry of a stochastic low-rank matrix from sequential observations. At each step, a learning agent chooses pairs of row and column arms, and receives the noisy product of their latent values as a reward. The main challenge is that the latent values are unobserved. We identify a class of non-negative matrices whose maximum entry can be found statistically efficiently and propose an algorithm for finding them, which we call LowRankElim. We derive a $\DeclareMathOperator{\poly}{poly} O((K + L) \poly(d) \Delta^{-1} \log n)$ upper bound on its $n$-step regret, where $K$ is the number of rows, $L$ is the number of columns, $d$ is the rank of the matrix, and $\Delta$ is the minimum gap. The bound depends on other problem-specific constants that clearly do not depend $K L$. To the best of our knowledge, this is the first such result in the literature.

[1]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2009, Found. Comput. Math..

[2]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[3]  Joel A. Tropp,et al.  Factoring nonnegative matrices with linear programs , 2012, NIPS.

[4]  Shie Mannor,et al.  Latent Bandits , 2014, ICML.

[5]  Alexandros G. Dimakis,et al.  Contextual Bandits with Latent Confounders: An NMF Approach , 2016, AISTATS.

[6]  Long Tran-Thanh,et al.  Efficient Thompson Sampling for Online Matrix-Factorization Recommendation , 2015, NIPS.

[7]  Justin K. Romberg,et al.  An Overview of Low-Rank Matrix Recovery From Incomplete Observations , 2016, IEEE Journal of Selected Topics in Signal Processing.

[8]  Patrick Seemann,et al.  Matrix Factorization Techniques for Recommender Systems , 2014 .

[9]  Emmanuel J. Candès,et al.  Matrix Completion With Noise , 2009, Proceedings of the IEEE.

[10]  Zheng Wen,et al.  Stochastic Rank-1 Bandits , 2016, AISTATS.

[11]  Shuai Li,et al.  Collaborative Filtering Bandits , 2015, SIGIR.

[12]  Robert D. Nowak,et al.  Active Positive Semidefinite Matrix Completion: Algorithms, Theory and Applications , 2017, AISTATS.

[13]  Akshay Krishnamurthy,et al.  Low-Rank Matrix and Tensor Completion via Adaptive Sampling , 2013, NIPS.

[14]  Shuai Li,et al.  Online Clustering of Bandits , 2014, ICML.

[15]  Lior Rokach,et al.  Introduction to Recommender Systems Handbook , 2011, Recommender Systems Handbook.

[16]  Peter Auer,et al.  UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem , 2010, Period. Math. Hung..