Subsampling for Ridge Regression via Regularized Volume Sampling

Given $n$ vectors $\mathbf{x}_i\in \mathbb{R}^d$, we want to fit a linear regression model for noisy labels $y_i\in\mathbb{R}$. The ridge estimator is a classical solution to this problem. However, when labels are expensive, we are forced to select only a small subset of vectors $\mathbf{x}_i$ for which we obtain the labels $y_i$. We propose a new procedure for selecting the subset of vectors, such that the ridge estimator obtained from that subset offers strong statistical guarantees in terms of the mean squared prediction error over the entire dataset of $n$ labeled vectors. The number of labels needed is proportional to the statistical dimension of the problem which is often much smaller than $d$. Our method is an extension of a joint subsampling procedure called volume sampling. A second major contribution is that we speed up volume sampling so that it is essentially as efficient as leverage scores, which is the main i.i.d. subsampling procedure for this task. Finally, we show theoretically and experimentally that volume sampling has a clear advantage over any i.i.d. sampling when labels are expensive.

[1]  Suvrit Sra,et al.  Column Subset Selection via Polynomial Time Dual Volume Sampling , 2017 .

[2]  Venkatesan Guruswami,et al.  Optimal column-based low-rank matrix reconstruction , 2011, SODA.

[3]  Michael W. Mahoney,et al.  Fast Randomized Kernel Ridge Regression with Statistical Guarantees , 2015, NIPS.

[4]  David P. Woodruff,et al.  Fast approximation of matrix coherence and statistical leverage , 2011, ICML.

[5]  Ulrich Paquet,et al.  Bayesian Low-Rank Determinantal Point Processes , 2016, RecSys.

[6]  S. Janson Tail bounds for sums of geometric and exponential variables , 2017, 1709.08157.

[7]  Yuanzhi Li,et al.  Near-Optimal Design of Experiments via Regret Minimization , 2017, ICML.

[8]  Byungkon Kang,et al.  Fast Determinantal Point Process Sampling with Application to Clustering , 2013, NIPS.

[9]  Suvrit Sra,et al.  Polynomial time algorithms for dual volume sampling , 2017, NIPS.

[10]  Ben Taskar,et al.  k-DPPs: Fixed-Size Determinantal Point Processes , 2011, ICML.

[11]  Manfred K. Warmuth,et al.  Unbiased estimates for linear regression via volume sampling , 2017, NIPS.

[12]  Santosh S. Vempala,et al.  Matrix approximation and projective clustering via volume sampling , 2006, SODA '06.

[13]  Shinichi Nakajima,et al.  Pool-based active learning in approximate linear regression , 2009, Machine Learning.

[14]  Ben Taskar,et al.  Determinantal Point Processes for Machine Learning , 2012, Found. Trends Mach. Learn..

[15]  Francis R. Bach,et al.  Sharp analysis of low-rank kernel matrix approximations , 2012, COLT.

[16]  Christos Boutsidis,et al.  Faster Subset Selection for Matrices and Applications , 2011, SIAM J. Matrix Anal. Appl..

[17]  Michael W. Mahoney Randomized Algorithms for Matrices and Data , 2011, Found. Trends Mach. Learn..

[18]  Luis Rademacher,et al.  Efficient Volume Sampling for Row/Column Subset Selection , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.