Compression-Based Evaluation of Partial Determinations

Our work tackles the problem of finding partial determinations in databases and proposes a compression-based measure to evaluate them. Partial determinations can be viewed as generalizations of both functional dependencies and association rules, in that they are relational in nature and may have exceptions. Extending the measures used for evaluating association rules, namely support and confidence, to partial determinations leads to a few problems. We therefore propose a measure based on the minimum description length (MDL) principle to remedy this problem. We assume the hypothetical task of transmitting a given database as efficiently as possible. The new measure estimates the compression achievable by transmitting partial determinations instead of the original data. It takes into account both the complexity and the correctness of a given partial determination, thus avoiding overfitting especially in the presence of noise. We also describe three different kinds of search using the new measure. Preliminary empirical results in a few boolean domains are favorable.