Clustering hypertext with applications to web searching

A method and structure of searching a database containing hypertext documents comprising searching the database using a query to produce a set of hypertext documents; and geometrically clustering the set of hypertext documents into various clusters using a toric k-means similarity measure such that documents within each cluster are similar to each other, wherein the clustering has a linear-time complexity in producing the set of hypertext documents, wherein the similarity measure comprises a weighted sum of maximized individual components of the set of hypertext documents, and wherein the clustering is based upon words contained in each hypertext document, out-links from each hypertext document, and in-links to each hypertext document.

[1]  Hans-Peter Frei,et al.  Making use of hypertext links when retrieving information , 1992, ECHT '92.

[2]  Paul S. Bradley,et al.  Refining Initial Points for K-Means Clustering , 1998, ICML.

[3]  Ricardo Baeza-Yates,et al.  Information Retrieval: Data Structures and Algorithms , 1992 .

[4]  Man Hon Wong,et al.  Web Document Classification based on Hyperlinks and Document Semantics , 2000, PRICAI Workshop on Text and Web Mining.

[5]  Ray R. Larson,et al.  Bibliometrics of the World Wide Web: An Exploratory Analysis of the Intellectual Structure of Cyberspace , 1996 .

[6]  Henry G. Small,et al.  Co-citation in the scientific literature: A new measure of the relationship between two documents , 1973, J. Am. Soc. Inf. Sci..

[7]  Gerard Salton,et al.  Associative Document Retrieval Techniques Using Bibliographic Information , 1963, JACM.

[8]  Jon M. Kleinberg,et al.  Automatic Resource Compilation by Analyzing Hyperlink Structure and Associated Text , 1998, Comput. Networks.

[9]  W. Bruce Croft,et al.  A retrieval model incorporating hypertext links , 1989, Hypertext.

[10]  Chaomei Chen Structuring and visualising the WWW by generalised similarity analysis , 1997, HYPERTEXT '97.

[11]  Jon Kleinberg,et al.  Authoritative sources in a hyperlinked environment , 1999, SODA '98.

[12]  Piotr Indyk,et al.  Enhanced hypertext categorization using hyperlinks , 1998, SIGMOD '98.

[13]  Kui-Lam Kwok,et al.  A probabilistic theory of indexing and similarity measure based on cited and citing documents , 1985, J. Am. Soc. Inf. Sci..

[14]  Sougata Mukherjea,et al.  Organizing topic-specific web information , 2000, HYPERTEXT '00.

[15]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[16]  L. R. Rasmussen,et al.  In information retrieval: data structures and algorithms , 1992 .

[17]  Sougata Mukherjea,et al.  Interactive clustering for navigating in hypermedia systems , 1994, ECHT '94.

[18]  Chanathip Namprempre,et al.  HyPursuit: a hierarchical network search engine that exploits content-link hypertext clustering , 1996, HYPERTEXT '96.

[19]  Peter Willett,et al.  Recent trends in hierarchic document clustering: A critical review , 1988, Inf. Process. Manag..

[20]  Mary Czerwinski,et al.  From latent semantics to spatial hypertext—an integrated approach , 1998, HYPERTEXT '98.

[21]  Vincent Kanade,et al.  Clustering Algorithms , 2021, Wireless RF Energy Transfer in the Massive IoT Era.

[22]  Alan F. Smeaton,et al.  A Connectivity Analysis Approach to Increasing Precision in Retrieval From Hyperlinked Documents , 1999, TREC.

[23]  Craig Silverstein,et al.  Analysis of a Very Large Altavista Query Log" SRC Technical note #1998-14 , 1998 .

[24]  Loren G. Terveen,et al.  Constructing, organizing, and visualizing collections of topically related Web resources , 1999, TCHI.

[25]  Giles,et al.  Searching the world wide Web , 1998, Science.

[26]  Ramana Rao,et al.  Silk from a sow's ear: extracting usable structures from the Web , 1996, CHI.

[27]  Rodrigo A. Botafogo Cluster analysis for hypertext systems , 1993, SIGIR.