Clustering of bipartite advertiser-keyword graph

In this paper we present top-down and bottom-up hierarchical clustering methods for large bipartite graphs. The top down approach employs a flow-based graph partitioning method, while the bottom up approach is a multiround hybrid of the single-link and average-link agglomerative clustering methods. We evaluate the quality of clusters obtained by these two methods using additional textual information and compare the results against other clustering techniques.

[1]  David G. Stork,et al.  Pattern Classification , 1973 .

[2]  Charles M. Fiduccia,et al.  A linear-time heuristic for improving network partitions , 1988, 25 years of DAC.

[3]  Robert E. Tarjan,et al.  Data structures and network algorithms , 1983, CBMS-NSF regional conference series in applied mathematics.

[4]  Andrew V. Goldberg,et al.  On Implementing the Push—Relabel Method for the Maximum Flow Problem , 1997, Algorithmica.

[5]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[6]  Doug Beeferman,et al.  Agglomerative clustering of a search engine query log , 2000, KDD '00.

[7]  Santosh S. Vempala,et al.  On clusterings-good, bad and spectral , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[8]  Chris H. Q. Ding,et al.  Bipartite graph partitioning and data clustering , 2001, CIKM '01.

[9]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[10]  Inderjit S. Dhillon,et al.  Co-clustering documents and words using bipartite spectral graph partitioning , 2001, KDD '01.

[11]  George Karypis,et al.  Evaluation of hierarchical clustering algorithms for document datasets , 2002, CIKM '02.

[12]  Satish Rao,et al.  A polynomial-time tree decomposition to minimize congestion , 2003, SPAA '03.