Using evidence theory for the integration of distributed databases

Distributed databases allow us to integrate data from different sources which have not previously been combined. In this article, we are concerned with the situation where the data sources are held in a distributed database. Integration of the data is then accomplished using the Dempster–Shafer representation of evidence. The weighted sum operator is developed and this operator is shown to provide an appropriate mechanism for the integration of such data. This representation is particularly suited to statistical samples which may include missing values and be held at different levels of aggregation. Missing values are incorporated into the representation to provide lower and upper probabilities for propositions of interest. The weighted sum operator facilitates combination of samples with different classification schemes. Such a capability is particularly useful for knowledge discovery when we are searching for rules within the concept hierarchy, defined in terms of probabilities or associations. By integrating information from different sources, we may thus be able to induce new rules or strengthen rules which have already been obtained. We develop a framework for describing such rules and show how we may then integrate rules at a high level without having to resort to the raw data, a useful facility for knowledge discovery where efficiency is of the essence. © 1997 John Wiley & Sons, Inc.