Efficient Classification from Multiple Heterogeneous Databases

With the fast expansion of computer networks, it is inevitable to study data mining on heterogeneous databases. In this paper we propose MDBM, an accurate and efficient approach for classification on multiple heterogeneous databases. We propose a regression-based method for predicting the usefulness of inter-database links that serve as bridges for information transfer, because such links are automatically detected and may or may not be useful or even valid. Because of the high cost of inter-database communication, MDBM employs a new strategy for cross-database classification, which finds and performs actions with high benefit-to-cost ratios. The experiments show that MDBM achieves high accuracy in cross-database classification, with much higher efficiency than previous approaches.

[1]  Hendrik Blockeel,et al.  Top-Down Induction of First Order Logical Decision Trees , 1998, AI Commun..

[2]  Erhard Rahm,et al.  A survey of approaches to automatic schema matching , 2001, The VLDB Journal.

[3]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2002, Journal of Cryptology.

[4]  Anders Krogh,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[5]  Pedro M. Domingos,et al.  iMAP: discovering complex semantic matches between database schemas , 2004, SIGMOD '04.

[6]  Chris Clifton,et al.  Privacy-preserving distributed mining of association rules on horizontally partitioned data , 2004, IEEE Transactions on Knowledge and Data Engineering.

[7]  Luc De Raedt,et al.  Top-down induction of logical decision trees , 1997 .

[8]  Theodore Johnson,et al.  Mining database structure; or, how to build a data quality browser , 2002, SIGMOD '02.

[9]  Benny Pinkas,et al.  Privacy Preserving Data Mining , 2000, Journal of Cryptology.

[10]  Weixiong Zhang Search techniques , 2002 .

[11]  Jan M. Zytkow,et al.  Handbook of Data Mining and Knowledge Discovery , 2002 .

[12]  Mihir Bellare Advances in Cryptology — CRYPTO 2000 , 2000, Lecture Notes in Computer Science.

[13]  R. Mike Cameron-Jones,et al.  FOIL: A Midterm Report , 1993, ECML.

[14]  Gu Si-yang,et al.  Privacy preserving association rule mining in vertically partitioned data , 2006 .

[15]  Chris Clifton,et al.  Privacy-preserving k-means clustering over vertically partitioned data , 2003, KDD '03.

[16]  David Wai-Lok Cheung,et al.  Efficient Mining of Association Rules in Distributed Databases , 1996, IEEE Trans. Knowl. Data Eng..

[17]  Philip S. Yu,et al.  CrossMine: efficient classification across multiple database relations , 2004, Proceedings. 20th International Conference on Data Engineering.

[18]  Peter Clark,et al.  Rule Induction with CN2: Some Recent Improvements , 1991, EWSL.