Succinct Matrix Approximation and Efficient k-NN Classification

This work reveals that instead of the polynomial bounds in previous literatures there exists a sharper bound of exponential form for the L2 norm of an arbitrary shaped random matrix. Based on the newly elaborated bound, a nonuniform sampling method is presented to succinctly approximate a matrix with a sparse binary one, and thus relieves the computation loads of k-NN classifier in both time and storage. The method is also pass-efficient because sampling and quantizing are combined together in a single step and the whole process can be completed within one pass over the input matrix. In the evaluations on compression ratio and reconstruction error, the sampling method exhibits impressive capability in providing succinct and tight approximations for the input matrices. The most significant finding in the classification experiment is that the k-NN classifier based on the approximation can even outperform the standard one. This provides another strong evidence for the claim that our method is especially capable in capturing intrinsic characteristics.

[1]  Gene H. Golub,et al.  Matrix computations (3rd ed.) , 1996 .

[2]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[3]  Alan M. Frieze,et al.  Fast monte-carlo algorithms for finding low-rank approximations , 2004, JACM.

[4]  Heikki Mannila,et al.  Local and Global Methods in Data Mining: Basic Techniques and Open Problems , 2002, ICALP.

[5]  Uriel Feige,et al.  Spectral techniques applied to sparse random graphs , 2005, Random Struct. Algorithms.

[6]  Margaret H. Dunham,et al.  Data Mining: Introductory and Advanced Topics , 2002 .

[7]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[8]  Dimitris Achlioptas,et al.  Fast computation of low rank matrix approximations , 2001, STOC '01.

[9]  Jieping Ye,et al.  Generalized Low Rank Approximations of Matrices , 2005, Machine Learning.

[10]  U. Feige,et al.  Spectral techniques applied to sparse random graphs , 2005 .

[11]  Golub Gene H. Et.Al Matrix Computations, 3rd Edition , 2007 .

[12]  N. Alon,et al.  On the concentration of eigenvalues of random symmetric matrices , 2000, math-ph/0009032.

[13]  Santosh S. Vempala,et al.  Adaptive Sampling and Fast Low-Rank Matrix Approximation , 2006, APPROX-RANDOM.

[14]  Sanjeev Arora,et al.  A Fast Random Sampling Algorithm for Sparsifying Matrices , 2006, APPROX-RANDOM.

[15]  Larry Wasserman,et al.  All of Statistics: A Concise Course in Statistical Inference , 2004 .

[16]  Ziv Bar-Yossef,et al.  Sampling lower bounds via information theory , 2003, STOC '03.

[17]  Alan M. Frieze,et al.  Clustering in large graphs and matrices , 1999, SODA '99.

[18]  Susan T. Dumais,et al.  Using Linear Algebra for Intelligent Information Retrieval , 1995, SIAM Rev..