论文信息 - Pattern Discovery in Distributed Databases

Pattern Discovery in Distributed Databases

Most algorithms for learning and pattern discovery in data assume that all the needed data is available on one computer at a single site. This assumption does not hold in situations where a number of independent databases reside on geographically distributed nodes of a computer network. These databases cannot be moved to a single site due to size, security, privacy and data-ownership concerns but all of them together constitute the dataset in which patterns must be discovered. Some pattern discovery algorithms can be adapted to such situations and some others become inefficient or inapplicable. In this paper we show how a decision-tree induction algorithm may be adapted for distributed data situations. We also discuss some general issues relating to the adaptability of other pattern discovery algorithms to distributed data situations

Raj Bhatnagar | Sriram Srinivasan

[1] Tian Zhang,et al. BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[2] Jiawei Han,et al. Intelligent Query Answering by Knowledge Discovery Techniques , 1996, IEEE Trans. Knowl. Data Eng..

[3] Clement T. Yu,et al. Optimization of Distributed Tree Queries , 1984, J. Comput. Syst. Sci..

[4] Ming-Syan Chen,et al. On the Complexity of Distributed Query Optimization , 1996, IEEE Trans. Knowl. Data Eng..