论文信息 - Clustering Relational Data: A Transactional Approach

Clustering Relational Data: A Transactional Approach

A methodology for clustering multi-relational data is proposed. Initially, tuple linkages in the database schema of the multi-relational entities are leveraged to virtually organize the available relational data into as many transactions, i.e. sets of feature-value pairs. The identified transactions are then partitioned into homogeneous groups. Each discovered cluster is equipped with a representative, that provides an explanation of the corresponding group of transactions, in terms of those feature-value pairs that are most likely to appear in a transaction belonging to that particular group. Outlier data are placed into a trash cluster, that is finally partitioned to mitigate the dissimilarity between the trash cluster and the previously generated clusters.

[1] R. Mooney,et al. Impact of Similarity Measures on Web-page Clustering , 2000 .

[2] Sudipto Guha,et al. CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[3] Michalis Vazirgiannis,et al. Cluster validity methods: part I , 2002, SGMD.

[4] Dimitrios Gunopulos,et al. Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[5] Stephen Muggleton,et al. The Effect of Relational Background Knowledge on Learning of Protein Three-Dimensional Fold Signatures , 2001, Machine Learning.

[6] Gregory Piatetsky-Shapiro,et al. Advances in Knowledge Discovery and Data Mining , 2004, Lecture Notes in Computer Science.

[7] George Karypis,et al. A Comparison of Document Clustering Techniques , 2000 .

[8] Sudipto Guha,et al. ROCK: a robust clustering algorithm for categorical attributes , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[9] Jennifer Widom,et al. Database Systems: The Complete Book , 2001 .

[10] Petra Perner,et al. Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[11] Philip S. Yu,et al. CrossClus: user-guided multi-relational clustering , 2007, Data Mining and Knowledge Discovery.

[12] Joshua Zhexue Huang,et al. Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values , 1998, Data Mining and Knowledge Discovery.

[13] Mathias Kirsten,et al. Extending K-Means Clustering to First-Order Representations , 2000, ILP.

[14] Fosca Giannotti,et al. Clustering Transactional Data , 2002, PKDD.

[15] Ashwin Srinivasan,et al. Relating chemical activity to structure: An examination of ILP successes , 1995, New Generation Computing.

[16] Douglas H. Fisher,et al. Knowledge Acquisition Via Incremental Conceptual Clustering , 1987, Machine Learning.

[17] B. Ripley,et al. Pattern Recognition , 1968, Nature.

[18] HalkidiMaria,et al. Cluster validity methods , 2002 .

[19] SlatterySeán,et al. Relational Learning with Statistical Predicate Invention , 2001 .

[20] Mark Craven,et al. Relational Learning with Statistical Predicate Invention: Better Models for Hypertext , 2001, Machine Learning.

[21] Fumio Mizoguchi,et al. Using Inductive Logic Programming to Learn Rules that Identify Glaucomatous Eyes , 1997 .

[22] Saso Dzeroski,et al. Detecting Traffic Problems with ILP , 1998, ILP.

[23] Philip S. Yu,et al. Efficient classification across multiple database relations: a CrossMine approach , 2006, IEEE Transactions on Knowledge and Data Engineering.

[24] Hans-Peter Kriegel,et al. Incremental Clustering for Mining in a Data Warehousing Environment , 1998, VLDB.

[25] Tian Zhang,et al. BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.