Density link-based methods for clustering web pages

World Wide Web is a huge information space, making it a valuable resource for decision making. However, it should be effectively managed for such a purpose. One important management technique is clustering the web data. In this paper, we propose some developments in clustering methods to achieve higher qualities. At first we study a new density based method adapted for hierarchical clustering of web documents. Then utilizing the hyperlink structure of web, we propose a new method that incorporates density concepts with web graph. These algorithms have the preference of low complexity and as experimental results reveal, the resultant clusters have high quality.

[1]  Howard J. Hamilton,et al.  DBRS: A Density-Based Spatial Clustering Method with Random Sampling , 2003, PAKDD.

[2]  Ran M. Bittmann,et al.  Visualization of multi-algorithm clustering for better economic decisions - The case of car pricing , 2009, Decis. Support Syst..

[3]  G. W. Milligan,et al.  An examination of procedures for determining the number of clusters in a data set , 1985 .

[4]  Pavel Zezula,et al.  M-tree: An Efficient Access Method for Similarity Search in Metric Spaces , 1997, VLDB.

[5]  Piotr Indyk,et al.  Enhanced hypertext categorization using hyperlinks , 1998, SIGMOD '98.

[6]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[7]  Chih-Ping Wei,et al.  A collaborative filtering-based approach to personalized document clustering , 2008, Decis. Support Syst..

[8]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[9]  Chanathip Namprempre,et al.  HyPursuit: a hierarchical network search engine that exploits content-link hypertext clustering , 1996, HYPERTEXT '96.

[10]  Yitong Wang,et al.  Use link-based clustering to improve Web search results , 2001, Proceedings of the Second International Conference on Web Information Systems Engineering.

[11]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[12]  Morteza Haghir Chehreghani,et al.  Attaining Higher Quality for Density Based Algorithms , 2007, RR.

[13]  Wen-Hsiang Lu,et al.  Using Web resources to construct multilingual medical thesaurus for cross-language medical information retrieval , 2008, Decis. Support Syst..

[14]  Peter Pirolli,et al.  Life, death, and lawfulness on the electronic frontier , 1997, CHI.

[15]  Oren Etzioni,et al.  Web document clustering: a feasibility demonstration , 1998, SIGIR '98.

[16]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[17]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[18]  Massimo Marchiori,et al.  The Quest for Correct Information on the Web: Hyper Search Engines , 1997, Comput. Networks.

[19]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[20]  Yanchun Zhang,et al.  Web Page Clustering: A Hyperlink-Based Similarity and Matrix-Based Hierarchical Algorithms , 2003, APWeb.

[21]  Charles T. Zahn,et al.  Graph-Theoretical Methods for Detecting and Describing Gestalt Clusters , 1971, IEEE Transactions on Computers.

[22]  Chinatsu Aone,et al.  Fast and effective text mining using linear-time document clustering , 1999, KDD '99.

[23]  Chris H. Q. Ding,et al.  Web document clustering using hyperlink structures , 2001, Comput. Stat. Data Anal..

[24]  Masaru Kitsuregawa,et al.  On Combining Link and Contents Information for Web Page Clustering , 2002, DEXA.

[25]  Huan Liu,et al.  A Distributed Hierarchical Clustering System for Web Mining , 2001, WAIM.

[26]  Minder Chen,et al.  TeamSpirit: Design, implementation, and evaluation of a Web-based group decision support system , 2007, Decis. Support Syst..

[27]  Hemant K. Bhargava,et al.  Progress in Web-based decision support technologies , 2007, Decis. Support Syst..

[28]  J. Neyman Second Berkeley Symposium on Mathematical Statistics and Probability , 1951 .