MRC: Multi Relational Clustering approach

— Clustering is a process of partitioning data objects into groups based on the similarity measures. Most of the existing methods perform clustering within a single table, but most of the real-world databases, however, store information in multiple tables. We propose a new method which is called Multi Relational Clustering (MRC) for clustering a relational database. The MRC approach uses existing clustering algorithms for clustering every table of database. Tables in a database are related to each other based on foreign keys. The MRC approach divides the tables into two categories: dependent and independent tables. A dependent table is a table that includes entities attributes, as well as fields related to the other entities which belong to the other tables. In fact a dependent table includes one or more foreign keys. The MRC approach firstly, clusters independent tables then utilizes these results for clustering dependent tables. The MRC clusters each table by existing clustering algorithm with respect to its fields. An important feature of the MRC approach is ability of clustering several tables in parallel. The proposed approach is very simple and is developed under SQL very efficiently. We offer a version of implementation of k-Means in SQL and use it for clustering a database by MRC approach. Our experiments show that the MRC is efficient for clustering a huge database in a relational environment.

[1]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[2]  Rajesh N. Davé,et al.  Robust clustering methods: a unified view , 1997, IEEE Trans. Fuzzy Syst..

[3]  Carlo Zaniolo,et al.  ATLAS: A Small but Complete SQL Extension for Data Mining and Data Streams , 2003, VLDB.

[4]  Hendrik Blockeel,et al.  Multi-Relational Data Mining , 2005, Frontiers in Artificial Intelligence and Applications.

[5]  H.M. Jamil Ad hoc association rule mining as SQL3 queries , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[6]  James Kelly,et al.  AutoClass: A Bayesian Classification System , 1993, ML.

[7]  Philip S. Yu,et al.  Finding generalized projected clusters in high dimensional spaces , 2000, SIGMOD '00.

[8]  James C. Bezdek,et al.  Generalized fuzzy c-means clustering strategies using Lp norm distances , 2000, IEEE Trans. Fuzzy Syst..

[9]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[10]  Mathias Kirsten,et al.  Extending K-Means Clustering to First-Order Representations , 2000, ILP.

[11]  P. Sopp Cluster analysis. , 1996, Veterinary immunology and immunopathology.

[12]  Theo Tryfonas,et al.  Frontiers in Artificial Intelligence and Applications , 2009 .

[13]  Philip S. Yu,et al.  Cross-relational clustering with user's guidance , 2005, KDD '05.

[14]  Tomasz Imielinski,et al.  MSQL: A Query Language for Database Mining , 1999, Data Mining and Knowledge Discovery.

[15]  Mathias Kirsten,et al.  Relational Distance-Based Clustering , 1998, ILP.

[16]  Thomas Gärtner,et al.  Kernels and Distances for Structured Data , 2004, Machine Learning.

[17]  Nicolaos B. Karayiannis,et al.  Soft learning vector quantization and clustering algorithms based on non-Euclidean norms: single-norm algorithms , 2005, IEEE Transactions on Neural Networks.

[18]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[19]  C. J. Date An Introduction to Database Systems , 1975 .

[20]  Carlos Ordonez,et al.  Integrating K-means clustering with a relational DBMS using SQL , 2006, IEEE Transactions on Knowledge and Data Engineering.

[21]  Jennifer Widom,et al.  A First Course in Database Systems , 1997 .

[22]  Carlos Ordonez Programming the K-means clustering algorithm in SQL , 2004, KDD '04.

[23]  Philip S. Yu,et al.  CrossClus: user-guided multi-relational clustering , 2007, Data Mining and Knowledge Discovery.

[24]  U. M. Feyyad Data mining and knowledge discovery: making sense out of data , 1996 .

[25]  Saso Dzeroski,et al.  Multi-relational data mining: an introduction , 2003, SKDD.

[26]  Carlos Ordonez,et al.  SQLEM: fast clustering in SQL using the EM algorithm , 2000, SIGMOD '00.