Bounding Negative Information in Frequent Sets Algorithms

In Data Mining applications of the frequent sets problem, such as finding association rules, a commonly used generalization is to see each transaction as the characteristic function of the corresponding itemset. This allows one to find also correlations between items not being in the transactions; but this may lead to the risk of a large and hard to interpret output. We propose a bottom-up algorithm in which the exploration of facts corresponding to items not being in the transactions is delayed with respect to positive information of items being in the transactions. This allows the user to dose the association rules found in terms of the amount of correlation allowed between absences of items. The algorithm takes advantage of the relationships between the corresponding frequencies of such itemsets. With a slight modification, our algorithm can be used as well to find all frequent itemsets consisting of an arbitrary number of present positive attributes and at most a predetermined number k of present negative attributes.