Mining Heterogeneous Information Networks by Exploring the Power of Links

Knowledge is power but for interrelated data, knowledge is often hidden in massive links in heterogeneous information networks. We explore the power of links at mining heterogeneous information networks with several interesting tasks, including link-based object distinction, veracity analysis, multidimensional online analytical processing of heterogeneous information networks, and rank-based clustering. Some recent results of our research that explore the crucial information hidden in links will be introduced, including (1) Distinct for object distinction analysis, (2) TruthFinder for veracity analysis, (3) Infonet-OLAP for online analytical processing of information networks, and (4) RankClus for integrated ranking-based clustering. We also discuss some of our on-going studies in this direction.

[1]  Raymond T. Ng,et al.  Very large data bases , 1994 .

[2]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[3]  M. KleinbergJon Authoritative sources in a hyperlinked environment , 1999 .

[4]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[5]  Luis Gravano,et al.  Approximate String Joins in a Database (Almost) for Free , 2001, VLDB.

[6]  Jennifer Widom,et al.  SimRank: a measure of structural-context similarity , 2002, KDD.

[7]  Rajeev Motwani,et al.  Robust and efficient fuzzy match for online data cleaning , 2003, SIGMOD '03.

[8]  Hamid Pirahesh,et al.  Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals , 1996, Data Mining and Knowledge Discovery.

[9]  Philip S. Yu,et al.  Object Distinction: Distinguishing Objects with Identical Names , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[10]  Philip S. Yu,et al.  Truth Discovery with Multiple Conflicting Information Providers on the Web , 2007, IEEE Transactions on Knowledge and Data Engineering.

[11]  Philip S. Yu,et al.  Graph OLAP: Towards Online Analytical Processing on Graphs , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[12]  Yizhou Sun,et al.  Ranking-based clustering of heterogeneous information networks with star network schema , 2009, KDD.

[13]  Yizhou Sun,et al.  RankClus: integrating clustering with ranking for heterogeneous information network analysis , 2009, EDBT '09.