Tensor Space Model for Hypertext Representation

We investigate the basics of tensor based hypertext representation and perform experiments this novel hypertext representation model. Most documents have an inherent hierarchical structure that render the desirable use of multidimensional representations such as those offered by tensor objects. We focus on the advantages of Tensor Space Model, in which documents are represented using second-order tensors. We exploit the local-structure and neighborhood recommendation encapsulated by the proposed representation. We define the distance metric on tensor space of hypertext documents, which is a generalization of distance metric defined on vector space model. Our results provide evidence that tensor based model is very efficient for clustering and classification of hypertext documents compared to traditional vector based model.

[1]  Vijay V. Raghavan,et al.  Vector Space Model of Information Retrieval - A Reevaluation , 1984, SIGIR.

[2]  Johannes Fürnkranz,et al.  Web Mining , 2005, Data Mining and Knowledge Discovery Handbook.

[3]  Martin van den Berg,et al.  Focused Crawling: A New Approach to Topic-Specific Web Resource Discovery , 1999, Comput. Networks.

[4]  Ronen Feldman,et al.  The Data Mining and Knowledge Discovery Handbook , 2005 .

[5]  Min-Yen Kan,et al.  Fast webpage classification using URL features , 2005, CIKM '05.

[6]  Jaideep Srivastava,et al.  Web Mining , 2004, Data Mining and Knowledge Discovery.

[7]  C. J. van Rijsbergen,et al.  Information Retrieval , 1979, Encyclopedia of GIS.

[8]  Yiming Yang,et al.  A Study of Approaches to Hypertext Categorization , 2002, Journal of Intelligent Information Systems.

[9]  Piotr Indyk,et al.  Enhanced hypertext categorization using hyperlinks , 1998, SIGMOD '98.

[10]  R. A. Silverman,et al.  Vector and Tensor Analysis with Applications , 1969 .

[11]  Lior Rokach,et al.  Data Mining And Knowledge Discovery Handbook , 2005 .

[12]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[13]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[14]  Kevin S. McCurley,et al.  Analysis of anchor text for web search , 2003, SIGIR.

[15]  Yi Shen,et al.  Improvement of HITS for Topic-Specific Web Crawler , 2005, ICIC.