Measures on Boolean polynomials and their applications in data mining

We characterize measures on free Boolean algebras and we examine the relationships that exist between measures and binary tables in relational databases. It is shown that these measures are completely defined by their values on positive conjunctions, and a formula that yields this value is obtained using the method of indicators. An extension of the notion of support that is well suited for tables with missing values is presented. Finally, we obtain Bonferroni-type inequalities that allow for approximative evaluations of these measures for several types of queries. An approximation algorithm and an analysis of the results produced is also included.

[1]  Alex Samorodnitsky,et al.  Inclusion-exclusion: Exact and approximate , 1996, Comb..

[2]  Heikki Mannila,et al.  Beyond Independence: Probabilistic Models for Query Approximation on Binary Transaction Data , 2003, IEEE Trans. Knowl. Data Eng..

[3]  Heikki Mannila Combining Discrete Algorithmic and Probabilistic Approaches in Data Mining , 2001, PKDD.

[4]  Sergiu Rudeanu Boolean functions and equations , 1974 .

[5]  Noam Nisan,et al.  Approximate Inclusion-Exclusion , 1990, Comb..

[6]  S. Jaroszewicz,et al.  An Inclusion-Exclusion Result for Boolean Polynomials and Its Applications in Data Mining , 2002 .

[7]  Heikki Mannila,et al.  Multiple Uses of Frequent Sets and Condensed Representations (Extended Abstract) , 1996, KDD.

[8]  Sergiu Rudeanu,et al.  Pseudo-Boolean Methods for Bivalent Programming , 1966 .

[9]  Heikki Mannila,et al.  Fast Discovery of Association Rules , 1996, Advances in Knowledge Discovery and Data Mining.

[10]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[11]  Ronald L. Graham,et al.  Concrete mathematics - a foundation for computer science , 1991 .

[12]  Peter L. Hammer,et al.  Pseudo-Boolean methods for bivalent programming : lecture at the first European meeting of the Institute of Management Sciences and of the Econometric Institute, Warsaw, September 2-7, 1966 , 1966 .

[13]  Diane J. Cook,et al.  Approximate Association Rule Mining , 2001, FLAIRS Conference.

[14]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .

[15]  J. Galambos,et al.  Bonferroni-type inequalities with applications , 1996 .

[16]  Ronaldo Iachan Nonsampling errors in surveys - a review , 1983 .

[17]  Bruno Crémilleux,et al.  Treatment of Missing Values for Association Rules , 1998, PAKDD.

[18]  Clement T. Yu,et al.  Priniples of Database Query Processing for Advanced Applications , 1997 .