A Framework for Mining Functional Dependencies from Large Distributed Databases

Discovering functional dependencies FDs from existing databases is important to knowledge discovery, machine learning and data quality assessment. A number of algorithms has been proposed in the literature for FD discovery. However these algorithms are designed to work with centralized databases. When they are applied to distributed databases, communication cost of transporting data from different sites makes the algorithms not efficient. In this paper, We analyze the characteristics of mining functional dependencies from large distributed database, and we propose an distributed mining framework for discovery FDs from distribute large databases. We develop a theorem that can prune candidate FDs effectively and extend the partition based approach for distributed databases.