论文信息 - Stochastic Low-Rank Bandits

Stochastic Low-Rank Bandits

Many problems in computer vision and recommender systems involve low-rank matrices. In this work, we study the problem of finding the maximum entry of a stochastic low-rank matrix from sequential observations. At each step, a learning agent chooses pairs of row and column arms, and receives the noisy product of their latent values as a reward. The main challenge is that the latent values are unobserved. We identify a class of non-negative matrices whose maximum entry can be found statistically efficiently and propose an algorithm for finding them, which we call LowRankElim. We derive a $\DeclareMathOperator{\poly}{poly} O((K + L) \poly(d) \Delta^{-1} \log n)$ upper bound on its $n$-step regret, where $K$ is the number of rows, $L$ is the number of columns, $d$ is the rank of the matrix, and $\Delta$ is the minimum gap. The bound depends on other problem-specific constants that clearly do not depend $K L$. To the best of our knowledge, this is the first such result in the literature.

[1] Emmanuel J. Candès,et al. Exact Matrix Completion via Convex Optimization , 2009, Found. Comput. Math..

[2] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[3] Joel A. Tropp,et al. Factoring nonnegative matrices with linear programs , 2012, NIPS.

[4] Shie Mannor,et al. Latent Bandits , 2014, ICML.

[5] Alexandros G. Dimakis,et al. Contextual Bandits with Latent Confounders: An NMF Approach , 2016, AISTATS.

[6] Long Tran-Thanh,et al. Efficient Thompson Sampling for Online Matrix-Factorization Recommendation , 2015, NIPS.

[7] Justin K. Romberg,et al. An Overview of Low-Rank Matrix Recovery From Incomplete Observations , 2016, IEEE Journal of Selected Topics in Signal Processing.

[8] Patrick Seemann,et al. Matrix Factorization Techniques for Recommender Systems , 2014 .

[9] Emmanuel J. Candès,et al. Matrix Completion With Noise , 2009, Proceedings of the IEEE.

[10] Zheng Wen,et al. Stochastic Rank-1 Bandits , 2016, AISTATS.

[11] Shuai Li,et al. Collaborative Filtering Bandits , 2015, SIGIR.

[12] Robert D. Nowak,et al. Active Positive Semidefinite Matrix Completion: Algorithms, Theory and Applications , 2017, AISTATS.

[13] Akshay Krishnamurthy,et al. Low-Rank Matrix and Tensor Completion via Adaptive Sampling , 2013, NIPS.

[14] Shuai Li,et al. Online Clustering of Bandits , 2014, ICML.

[15] Lior Rokach,et al. Introduction to Recommender Systems Handbook , 2011, Recommender Systems Handbook.

[16] Peter Auer,et al. UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem , 2010, Period. Math. Hung..