Reducing uncertainties in data mining

Data mining, which is also referred to as knowledge discovery in databases, has attracted much research interest. Data mining among independently developed databases often involves uncertain information. These uncertainties can be generated during both processes of combining relations and merging tuples. We propose a framework in which uncertainties can be measured. The objective is to determine the best way to combine and merge tuples in multiple databases and avoid generating unexpected uncertainties. The Shannon entropy theory plays a key part in our approach to reduce uncertainties when merging related tuples in a combined relation. Detailed examples are provided to address key issues.