Document Clustering, Visualization, and Retrieval via Link Mining

Clustering for document retrieval has traditionally been done through word-based similarity. But this approach suffers from the ambiguity problems inherent in natural languages. Language-based processing can be augmented by analysis of links among document sets, i.e. hypertext Web links or literature citations. Indeed, early workers in information science recognized the shortcomings with word-based document processing. This led to the introduction of document processing based on literature citations [6]. An important development was the notion of co-citation [13], in which a document pair is associated by being jointly cited (or co-cited) by other documents. In general, clustering based on co-citation as a similarity measure is known to correspond well to document semantics. Spurred by the popularity of the Web, more recent approaches have been developed for analyzing hyperlinks, though primarily for search engine page ranking [12, 8].

[1]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[2]  J. Lederberg,et al.  Toward a metric of science : the advent of science indicators , 1980 .

[3]  I. Daubechies Ten Lectures on Wavelets , 1992 .

[4]  Harold H. Szu,et al.  Multiple-resolution clustering for recursive divide and conquer , 1997, Defense, Security, and Sensing.

[5]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[6]  Ivan Herman,et al.  Graph Visualization and Navigation in Information Visualization: A Survey , 2000, IEEE Trans. Vis. Comput. Graph..

[7]  Joshua Lederberg,et al.  [Introduction to "Toward A Metric of Science: The Advent of Science Indicators"] , 1979 .

[8]  Vijay V. Raghavan,et al.  Data mining and visualization of reference associations: higher order citation analysis , 2001 .

[9]  Truong Q. Nguyen,et al.  Wavelets and filter banks , 1996 .

[10]  James J. Thomas,et al.  Visualizing the non-visual: spatial analysis and interaction with information from text documents , 1995, Proceedings of Visualization 1995 Conference.

[11]  Edward M. Reingold,et al.  Graph drawing by force‐directed placement , 1991, Softw. Pract. Exp..

[12]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[13]  Les Carr,et al.  Trailblazing the literature of hypertext: author co-citation analysis (1989–1998) , 1999, HYPERTEXT '99.

[14]  Henry G. Small,et al.  Co-citation in the scientific literature: A new measure of the relationship between two documents , 1973, J. Am. Soc. Inf. Sci..

[15]  Henry G. Small,et al.  Macro-level changes in the structure of co-citation clusters: 1983–1989 , 2005, Scientometrics.

[16]  Eugene Garfield,et al.  Citation data as science indicators , 1978 .

[17]  Vijay V. Raghavan,et al.  Visualizing association mining results through hierarchical clusters , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[18]  J. R. Koehler,et al.  Modern Applied Statistics with S-Plus. , 1996 .

[19]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.