Exploring the Power of Heuristics and Links in Multi-relational Data Mining

Relational databases are the most popular repository for structured data, and are thus one of the richest sources of knowledge in the world. Because of the complexity of relational data, it is a challenging task to design efficient and scalable data mining approaches in relational databases. In this paper we discuss two methodologies to address this issue. The first methodology is to use heuristics to guide the data mining procedure, in order to avoid aimless, exhaustive search in relational databases. The second methodology is to assign certain property to each object in the database, and let different objects interact with each other along the links. Experiments show that both approaches achieve high efficiency and accuracy in real applications.

[1]  Saso Dzeroski,et al.  Inductive Logic Programming and Knowledge Discovery in Databases , 1996, Advances in Knowledge Discovery and Data Mining.

[2]  R. Mike Cameron-Jones,et al.  FOIL: A Midterm Report , 1993, ECML.

[3]  Hongjun Lu,et al.  ReCoM: reinforcement clustering of multi-type interrelated data objects , 2003, SIGIR.

[4]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[5]  Philip S. Yu,et al.  Truth Discovery with Multiple Conflicting Information Providers on the Web , 2007, IEEE Transactions on Knowledge and Data Engineering.

[6]  Jennifer Widom,et al.  SimRank: a measure of structural-context similarity , 2002, KDD.

[7]  Dániel Fogaras,et al.  Scaling link-based similarity search , 2005, WWW '05.

[8]  Saso Dzeroski,et al.  Multi-relational data mining: an introduction , 2003, SKDD.

[9]  Philip S. Yu,et al.  CrossMine: efficient classification across multiple database relations , 2004, Proceedings. 20th International Conference on Data Engineering.

[10]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[11]  Philip S. Yu,et al.  Fast algorithms for projected clustering , 1999, SIGMOD '99.

[12]  Philip S. Yu,et al.  LinkClus: efficient clustering via heterogeneous semantic links , 2006, VLDB.

[13]  Luc De Raedt,et al.  Top-down induction of logical decision trees , 1997 .

[14]  Mathias Kirsten,et al.  Relational Distance-Based Clustering , 1998, ILP.

[15]  Luc De Raedt,et al.  Top-Down Induction of Clustering Trees , 1998, ICML.

[16]  Philip S. Yu,et al.  CrossClus: user-guided multi-relational clustering , 2007, Data Mining and Knowledge Discovery.

[17]  Pavel B. Brazdil,et al.  Machine Learning: ECML-93 , 1993, Lecture Notes in Computer Science.