Clustering Relational Data: A Transactional Approach

A methodology for clustering multi-relational data is proposed. Initially, tuple linkages in the database schema of the multi-relational entities are leveraged to virtually organize the available relational data into as many transactions, i.e. sets of feature-value pairs. The identified transactions are then partitioned into homogeneous groups. Each discovered cluster is equipped with a representative, that provides an explanation of the corresponding group of transactions, in terms of those feature-value pairs that are most likely to appear in a transaction belonging to that particular group. Outlier data are placed into a trash cluster, that is finally partitioned to mitigate the dissimilarity between the trash cluster and the previously generated clusters.

[1]  R. Mooney,et al.  Impact of Similarity Measures on Web-page Clustering , 2000 .

[2]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[3]  Michalis Vazirgiannis,et al.  Cluster validity methods: part I , 2002, SGMD.

[4]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[5]  Stephen Muggleton,et al.  The Effect of Relational Background Knowledge on Learning of Protein Three-Dimensional Fold Signatures , 2001, Machine Learning.

[6]  Gregory Piatetsky-Shapiro,et al.  Advances in Knowledge Discovery and Data Mining , 2004, Lecture Notes in Computer Science.

[7]  George Karypis,et al.  A Comparison of Document Clustering Techniques , 2000 .

[8]  Sudipto Guha,et al.  ROCK: a robust clustering algorithm for categorical attributes , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[9]  Jennifer Widom,et al.  Database Systems: The Complete Book , 2001 .

[10]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[11]  Philip S. Yu,et al.  CrossClus: user-guided multi-relational clustering , 2007, Data Mining and Knowledge Discovery.

[12]  Joshua Zhexue Huang,et al.  Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values , 1998, Data Mining and Knowledge Discovery.

[13]  Mathias Kirsten,et al.  Extending K-Means Clustering to First-Order Representations , 2000, ILP.

[14]  Fosca Giannotti,et al.  Clustering Transactional Data , 2002, PKDD.

[15]  Ashwin Srinivasan,et al.  Relating chemical activity to structure: An examination of ILP successes , 1995, New Generation Computing.

[16]  Douglas H. Fisher,et al.  Knowledge Acquisition Via Incremental Conceptual Clustering , 1987, Machine Learning.

[17]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[18]  HalkidiMaria,et al.  Cluster validity methods , 2002 .

[19]  SlatterySeán,et al.  Relational Learning with Statistical Predicate Invention , 2001 .

[20]  Mark Craven,et al.  Relational Learning with Statistical Predicate Invention: Better Models for Hypertext , 2001, Machine Learning.

[21]  Fumio Mizoguchi,et al.  Using Inductive Logic Programming to Learn Rules that Identify Glaucomatous Eyes , 1997 .

[22]  Saso Dzeroski,et al.  Detecting Traffic Problems with ILP , 1998, ILP.

[23]  Philip S. Yu,et al.  Efficient classification across multiple database relations: a CrossMine approach , 2006, IEEE Transactions on Knowledge and Data Engineering.

[24]  Hans-Peter Kriegel,et al.  Incremental Clustering for Mining in a Data Warehousing Environment , 1998, VLDB.

[25]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.