From subspace clustering to full-rank matrix completion

Subspace clustering is the problem of finding a multi-subspace representation that best fits a collection of points taken from a high-dimensional space. This type of structure occurs naturally in many applications ranging from bioinformatics, image/text clustering to semi-supervised learning. The companion paper [3] shows that robust and tractable subspace clustering is possible with minimal requirements on the orientation of the subspaces and number of samples per subspace. This note summarizes a forthcoming work [1] on subspace clustering when some of the entries in the data matrix are missing. This problem may also be viewed as a generalization of standard low-rank matrix completion to cases where the matrix is of high or potentially full-rank. Synthetic and real data experiments confirm the effectiveness of these methods. 1 Problem formulation and model Consider a real-valued n × N dimensional matrix X. We assume that the columns of X lie in a union of L unknown linear subspaces, of unknown dimensions. A small subset of the entries of such a matrix is revealed. The goal is two fold: 1) partition the columns into different clusters based on subspace of origin and approximate the underlying subspaces. 2) inpute the missing entries. Throughout we assume that the each entry of X is observed with probability 1 − δ. 2 Method Here we explain our method for subspace clustering with missing data. Upon finding the correct clustering, one can apply any one of the low-rank matrix recovery algorithms on each cluster to complete the missing entries. To introduce our method, we first study the problem when all entries are revealed. 2.1 No missing entries Most spectral clustering algorithms follow a two-step procedure: I) Construct a weighted graph W that captures the similarity between any pair of points, II) Select clusters by applying spectral clustering techniques to W. Following [2,3,6] we build the affinity graph in Step I by finding the sparsest expansion of each column x (i) of X as a linear combination every other column. Under some generic conditions, one expects that the sparsest representation of x (i) would only select vectors from the subspace in which x (i) happens to lie in. This leads to the following sequence of optimization problems min β∈R N β 1 subject to Xβ = x (i) and β i = 0. (2.1) One then collects the outcome of these N optimization problems as columns of a matrix B and …

[1]  Robert D. Nowak,et al.  High-Rank Matrix Completion and Subspace Clustering with Missing Data , 2011, ArXiv.

[2]  Emmanuel J. Candès,et al.  Robust Subspace Clustering , 2013, ArXiv.

[3]  Andrea Montanari,et al.  Matrix completion from a few entries , 2009, 2009 IEEE International Symposium on Information Theory.

[4]  Ehsan Elhamifar,et al.  Sparse subspace clustering , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Emmanuel J. Candès,et al.  A Geometric Analysis of Subspace Clustering with Outliers , 2011, ArXiv.