Exploring the Power of Links in Data Mining

Algorithms like PageRank and HITS have been developed in late 1990s to explore links among Web pages to discover authoritative pages and hubs. Links have also been popularly used in citation analysis and social network analysis. We show that the power of links can be explored thoroughly in data mining, such as classification, clustering, information integration, and object distinction. Some recent results of our research that explore the crucial information hidden inside links will be introduced, including (1) multi-relational classification, (2) user-guided clustering, (3) link-based clustering, and (4) object distinction analysis. The power of links in other analysis tasks will also be discussed in the talk.

[1]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994, Structural analysis in the social sciences.

[2]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[3]  Jennifer Widom,et al.  SimRank: a measure of structural-context similarity , 2002, KDD.

[4]  Philip S. Yu,et al.  Object Distinction: Distinguishing Objects with Identical Names , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[5]  Wei-Ying Ma,et al.  Block-level link analysis , 2004, SIGIR '04.

[6]  Philip S. Yu,et al.  LinkClus: efficient clustering via heterogeneous semantic links , 2006, VLDB.

[7]  Philip S. Yu,et al.  Truth Discovery with Multiple Conflicting Information Providers on the Web , 2007, IEEE Transactions on Knowledge and Data Engineering.

[8]  Bing Liu,et al.  Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data , 2006, Data-Centric Systems and Applications.

[9]  Philip S. Yu,et al.  Cross-relational clustering with user's guidance , 2005, KDD '05.

[10]  Lise Getoor,et al.  Link mining: a new data mining challenge , 2003, SKDD.

[11]  Jon M. Kleinberg,et al.  Mining the Web's Link Structure , 1999, Computer.

[12]  Philip S. Yu,et al.  CrossMine: Efficient Classification Across Multiple Database Relations , 2004, Constraint-Based Mining and Inductive Databases.