Database classification for multi-database mining

Many large organizations have multiple databases distributed in different branches, and therefore multidatabase mining is an important task for data mining. To reduce the search cost in the data from all databases, we need to identify which databases are most likely relevant to a data mining application. This is referred to as database selection. For real-world applications, database selection has to be carried out multiple times to identify relevant databases that meet different applications. In particular, a mining task may be without reference to any specific application. In this paper, we present an efficient approach for classifying multiple databases based on their similarity between each other. Our approach is application-independent.

[1]  Masaru Kitsuregawa,et al.  Parallel mining algorithms for generalized association rules with classification hierarchy , 1997, SIGMOD '98.

[2]  Chengqi Zhang,et al.  Estimating itemsets of interest by sampling , 2001, 10th IEEE International Conference on Fuzzy Systems. (Cat. No.01CH37297).

[3]  Jan Komorowski,et al.  Principles of Data Mining and Knowledge Discovery , 2001, Lecture Notes in Computer Science.

[4]  Philip K. Chan,et al.  Advances in Distributed and Parallel Knowledge Discovery , 2000 .

[5]  Moustafa Ghanem,et al.  Large Scale Data Mining: Challenges and Responses , 1997, KDD.

[6]  S. Stolfo,et al.  Pruning Meta-Classifiers in a Distributed Data Mining System , 1998 .

[7]  Robert L. Grossman,et al.  The Preliminary Design of Papyrus: A System for High Performance Distributed Data Mining over Cluste , 1998, AAAI 1998.

[8]  Yiyu Yao,et al.  Peculiarity Oriented Multidatabase Mining , 2003, IEEE Trans. Knowl. Data Eng..

[9]  Hugo Liu,et al.  Searching Multiple Databases for Interesting Complexes , 1997 .

[10]  David Wai-Lok Cheung,et al.  Efficient Mining of Association Rules in Distributed Databases , 1996, IEEE Trans. Knowl. Data Eng..

[11]  Xindong Wu,et al.  Multi-Database Mining , 2003, IEEE Intell. Informatics Bull..

[12]  Stefan Wrobel,et al.  An Algorithm for Multi-relational Discovery of Subgroups , 1997, PKDD.

[13]  Hongjun Lu,et al.  Toward Multidatabase Mining: Identifying Relevant Databases , 2001, IEEE Trans. Knowl. Data Eng..

[14]  Hongjun Lu,et al.  Identifying Relevant Databases for Multidatabase Mining , 1998, PAKDD.

[15]  Xindong Wu,et al.  Synthesizing High-Frequency Rules from Different Data Sources , 2003, IEEE Trans. Knowl. Data Eng..

[16]  Yiyu Yao,et al.  Peculiarity Oriented Multi-database Mining , 1999, PKDD.

[17]  Bruce G. Buchanan,et al.  The WoRLD: Knowledge Discovery from Multiple Distributed Databases , 2007 .

[18]  Julius T. Tou,et al.  Information Systems , 1973, GI Jahrestagung.

[19]  Xindong Wu,et al.  Knowledge Discovery in Multiple Databases , 2004, ICTAI.

[20]  Philip K. Chan,et al.  Meta-learning in distributed data mining systems: Issues and approaches , 2007 .

[21]  Hillol Kargupta,et al.  Collective Principal Component Analysis from Distributed, Heterogeneous Data , 2000, PKDD.