Approximate Sparse Linear Regression

In the Sparse Linear Regression (SLR) problem, given a $d \times n$ matrix $M$ and a $d$-dimensional query $q$, the goal is to compute a $k$-sparse $n$-dimensional vector $\tau$ such that the error $||M \tau-q||$ is minimized. This problem is equivalent to the following geometric problem: given a set $P$ of $n$ points and a query point $q$ in $d$ dimensions, find the closest $k$-dimensional subspace to $q$, that is spanned by a subset of $k$ points in $P$. In this paper, we present data-structures/algorithms and conditional lower bounds for several variants of this problem (such as finding the closest induced $k$ dimensional flat/simplex instead of a subspace). In particular, we present approximation algorithms for the online variants of the above problems with query time $\tilde O(n^{k-1})$, which are of interest in the "low sparsity regime" where $k$ is small, e.g., $2$ or $3$. For $k=d$, this matches, up to polylogarithmic factors, the lower bound that relies on the affinely degenerate conjecture (i.e., deciding if $n$ points in $\mathbb{R}^d$ contains $d+1$ points contained in a hyperplane takes $\Omega(n^d)$ time). Moreover, our algorithms involve formulating and solving several geometric subproblems, which we believe to be of independent interest.

[1]  Lihi Zelnik-Manor,et al.  Approximate Nearest Subspace Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Rafail Ostrovsky,et al.  Efficient search for approximate nearest neighbor in high dimensional spaces , 1998, STOC '98.

[3]  Sunil Arya,et al.  An optimal algorithm for approximate nearest neighbor searching fixed dimensions , 1998, JACM.

[4]  Mihai Patrascu,et al.  On the possibility of faster SAT algorithms , 2010, SODA '10.

[5]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  S. Mallat,et al.  Adaptive greedy approximations , 1997 .

[7]  Avner Magen,et al.  Dimensionality Reductions That Preserve Volumes and Distance to Affine Spaces, and Their Algorithmic Applications , 2002, RANDOM.

[8]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[9]  Emmanuel J. Candès,et al.  Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information , 2004, IEEE Transactions on Information Theory.

[10]  Dean P. Foster,et al.  Variable Selection is Hard , 2014, COLT.

[11]  Mark de Berg,et al.  Computational geometry: algorithms and applications , 1997 .

[12]  Balas K. Natarajan,et al.  Sparse Approximate Solutions to Linear Systems , 1995, SIAM J. Comput..

[13]  Alexandr Andoni,et al.  Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[14]  Raimund Seidel,et al.  Erratum to Better Lower Bounds on Detecting Affine and Spherical Degeneracies , 1993, Discret. Comput. Geom..

[15]  David L Donoho,et al.  Compressed sensing , 2006, IEEE Transactions on Information Theory.

[16]  R. Tibshirani,et al.  Regression shrinkage and selection via the lasso: a retrospective , 2011 .

[17]  Sepideh Mahabadi Approximate Nearest Line Search in High Dimensions , 2015, SODA.

[18]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[19]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.