Impurity measures in databases

Abstract. We introduce purity dependencies as generalizations of functional dependencies in relational databases starting from the notion of impurity measure. The impurity measure of a subset of a set relative to a partition of that set and the relative impurity of two partitions allow us to define the relative impurity of two attribute sets of a table of a relational database and to introduce purity dependencies. We discuss properties of these dependencies that generalize similar properties of functional dependencies and we highlight their relevance for approximate classifications. Finally, an algorithm that mines datasets for these dependencies is presented.