Efficient clustering of databases induced by local patterns

Many large organizations have multiple large databases as they transact from multiple branches. Most of the previous pieces of work are based on a single database. Thus, it is necessary to study data mining on multiple databases. In this paper, we propose two measures of similarity between a pair of databases. Also, we propose an algorithm for clustering a set of databases. Efficiency of the clustering process has been improved using the following strategies: reducing execution time of clustering algorithm, using more appropriate similarity measure, and storing frequent itemsets space efficiently.

[1]  Robert Bartle,et al.  The Elements of Real Analysis , 1977, The Mathematical Gazette.

[2]  Kamal Ali,et al.  Partial Classification Using Association Rules , 1997, KDD.

[3]  María N. Moreno García,et al.  Building knowledge discovery-driven models for decision support in project management , 2004, Decis. Support Syst..

[4]  Yen-Liang Chen,et al.  Market basket analysis in a multiple store environment , 2005, Decis. Support Syst..

[5]  C. L Liu,et al.  Elements of discrete mathematics (McGraw-Hill computer science series) , 1977 .

[6]  Hongjun Lu,et al.  Toward Multidatabase Mining: Identifying Relevant Databases , 2001, IEEE Trans. Knowl. Data Eng..

[7]  Xindong Wu,et al.  A logical framework for identifying quality knowledge from different data sources , 2006, Decis. Support Syst..

[8]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[9]  Surajit Chaudhuri,et al.  Dynamic sample selection for approximate query processing , 2003, SIGMOD '03.

[10]  Jiawei Han,et al.  Efficient Classification from Multiple Heterogeneous Databases , 2005, PKDD.

[11]  Tian Zhang,et al.  BIRCH: A New Data Clustering Algorithm and Its Applications , 1997, Data Mining and Knowledge Discovery.

[12]  Samuel W. K. Chan,et al.  Unsupervised clustering for nontextual web document classification , 2004, Decis. Support Syst..

[13]  Jaideep Srivastava,et al.  Selecting the right interestingness measure for association patterns , 2002, KDD.

[14]  David A. Huffman,et al.  A method for the construction of minimum-redundancy codes , 1952, Proceedings of the IRE.

[15]  Xindong Wu,et al.  A Decremental Algorithm for Maintaining Frequent Itemsets in Dynamic Databases , 2005, DaWaK.

[16]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[17]  Ronald E. Prather,et al.  Elements of discrete mathematics , 1986 .

[18]  James H Harrison,et al.  Multi-database mining. , 2008, Clinics in laboratory medicine.

[19]  Heikki Mannila,et al.  Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining , 1997 .

[20]  Hong Li,et al.  An Improved Database Classification Algorithm for Multi-database Mining , 2009, FAW.

[21]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[22]  Philip S. Yu,et al.  Efficient classification across multiple database relations: a CrossMine approach , 2006, IEEE Transactions on Knowledge and Data Engineering.

[23]  Shangteng Huang,et al.  Data privacy protection in multi-party clustering , 2008, Data Knowl. Eng..

[24]  G. Blelloch Introduction to Data Compression * , 2022 .

[25]  Qiang Yang,et al.  Discovering Classification from Data of Multiple Sources , 2006, Data Mining and Knowledge Discovery.

[26]  Xindong Wu,et al.  Database classification for multi-database mining , 2005, Inf. Syst..

[27]  Ming-Syan Chen,et al.  Sliding-window filtering: an efficient algorithm for incremental mining , 2001, CIKM '01.

[28]  Ying Wu,et al.  Privacy Aware Market Basket Data Set Generation: A Feasible Approach for Inverse Frequent Set Mining , 2005, SDM.

[29]  Stan Matwin,et al.  Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases , 2007 .

[30]  Xindong Wu,et al.  Knowledge Discovery in Multiple Databases , 2004, ICTAI.

[31]  Ujjwal Maulik,et al.  Clustering distributed data streams in peer-to-peer environments , 2006, Inf. Sci..