Spectral analysis of data

Experimental evidence suggests that spectral techniques are valuable for a wide range of applications. A partial list of such applications include (i) semantic analysis of documents used to cluster documents into areas of interest, (ii) collaborative filtering --- the reconstruction of missing data items, and (iii) determining the relative importance of documents based on citation/link structure. Intuitive arguments can explain some of the phenomena that has been observed but little theoretical study has been done. In this paper we present a model for framing data mining tasks and a unified approach to solving the resulting data mining problems using spectral analysis. These results give strong justification to the use of spectral techniques for latent semantic indexing, collaborative filtering, and web site ranking.

[1]  M. KleinbergJon Authoritative sources in a hyperlinked environment , 1999 .

[2]  Susan T. Dumais,et al.  Using Linear Algebra for Intelligent Information Retrieval , 1995, SIAM Rev..

[3]  Ravi B. Boppana,et al.  Eigenvalues and graph bisection: An average-case analysis , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[4]  G. W. Stewart,et al.  Matrix Algorithms: Volume 1, Basic Decompositions , 1998 .

[5]  Santosh S. Vempala,et al.  Latent semantic indexing: a probabilistic analysis , 1998, PODS '98.

[6]  Gene H. Golub,et al.  Matrix computations (3rd ed.) , 1996 .

[7]  Christos Faloutsos,et al.  Ratio Rules: A New Paradigm for Fast, Quantifiable Data Mining , 1998, VLDB.

[8]  János Komlós,et al.  The eigenvalues of random symmetric matrices , 1981, Comb..

[9]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[10]  Pattie Maes,et al.  Social information filtering: algorithms for automating “word of mouth” , 1995, CHI '95.

[11]  Ravi Kumar,et al.  Recommendation Systems , 2001 .

[12]  William W. Cohen,et al.  Recommendation as Classification: Using Social and Content-Based Information in Recommendation , 1998, AAAI/IAAI.

[13]  Gene H. Golub,et al.  Matrix computations , 1983 .

[14]  Mark Rosenstein,et al.  Recommending and evaluating choices in a virtual community of use , 1995, CHI '95.

[15]  Elizabeth R. Jessup,et al.  Matrices, Vector Spaces, and Information Retrieval , 1999, SIAM Rev..

[16]  Santosh S. Vempala,et al.  On clusterings-good, bad and spectral , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.