The complexity of satisfying constraints on databases of transactions

Computing frequent itemsets is one of the most prominent problems in data mining. Recently, a new related problem, called FREQSAT, was introduced and studied: given some itemset–interval pairs, does there exist a database such that for every pair, the frequency of the itemset falls in the interval? In this paper, we extend this FREQSAT-problem by further constraining the database by giving other characteristics as part of the input as well. These characteristics are the maximal transaction length, the maximal number of transactions, and the maximal number of duplicates of a transaction. These extensions and all their combinations are studied in depth, and a hierarchy w.r.t. complexity is given. To make a complete picture, also the cases where the characteristics are constant; i.e., bounded and the bound being a fixed constant that is not a part of the input, are studied.

[1]  Pierre Hansen,et al.  Models and Algorithms for Probabilistic and Bayesian Logic , 1995, IJCAI.

[2]  Heikki Mannila,et al.  Multiple Uses of Frequent Sets and Condensed Representations (Extended Abstract) , 1996, KDD.

[3]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[4]  Gerd Stumme,et al.  Mining frequent patterns with counting inference , 2000, SKDD.

[5]  Peter Haddawy,et al.  Anytime Deduction for Probabilistic Logic , 1994, Artif. Intell..

[6]  Toon Calders Computational complexity of itemset frequency satisfiability , 2004, PODS '04.

[7]  J. B. Paris,et al.  The Uncertain Reasoner's Companion: Bibliography , 1995 .

[8]  Dirk Van Gucht,et al.  A probability analysis for candidate-based frequent itemset algorithms , 2006, SAC.

[9]  Ramakrishnan Srikant,et al.  Privacy-preserving data mining , 2000, SIGMOD '00.

[10]  Thomas Lukasiewicz,et al.  Probabilistic logic programming with conditional constraints , 2001, TOCL.

[11]  William Frawley,et al.  Knowledge Discovery in Databases , 1991 .

[12]  Ying Wu,et al.  Privacy Aware Market Basket Data Set Generation: A Feasible Approach for Inverse Frequent Set Mining , 2005, SDM.

[13]  Yongge Wang,et al.  Approximate inverse frequent itemset mining: privacy, complexity, and approximation , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[14]  Guizhen Yang,et al.  The complexity of mining maximal frequent itemsets and maximal frequent patterns , 2004, KDD.

[15]  Thomas Lukasiewicz,et al.  Local probabilistic deduction from taxonomic and probabilistic knowledge-bases over conjunctive events , 1999, Int. J. Approx. Reason..

[16]  Jeffrey D. Ullman,et al.  Principles of Database Systems , 1980 .

[17]  Roberto J. Bayardo,et al.  Efficiently mining long patterns from databases , 1998, SIGMOD '98.

[18]  Jean-François Boulicaut,et al.  A Survey on Condensed Representations for Frequent Sets , 2004, Constraint-Based Mining and Inductive Databases.

[19]  W. Spears Probabilistic Satisfiability , 1992 .

[20]  Toon Calders,et al.  Axiomatization of frequent itemsets , 2003, Theor. Comput. Sci..

[21]  Tgk Toon Calders Axiomatization and deduction rules for the frequency of itemsets , 2003 .

[22]  Maria E. Orlowska,et al.  A Further Study on Inverse Frequent Set Mining , 2005, ADMA.

[23]  Christos H. Papadimitriou,et al.  Probabilistic satisfiability , 1988, J. Complex..

[24]  Taneli Mielikäinen,et al.  On Inverse Frequent Set Mining , 2003 .

[25]  Vladimir Gurvich,et al.  On the Complexity of Generating Maximal Frequent and Minimal Infrequent Sets , 2002, STACS.

[26]  Toon Calders,et al.  Non-derivable itemset mining , 2007, Data Mining and Knowledge Discovery.

[27]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[28]  Toon Calders,et al.  Mining All Non-derivable Frequent Itemsets , 2002, PKDD.

[29]  Christos H. Papadimitriou,et al.  Computational complexity , 1993 .

[30]  V. Chvétal Recognizing Intersection Patterns , 1980 .

[31]  Nils J. Nilsson,et al.  Probabilistic Logic * , 2022 .

[32]  David S. Johnson,et al.  Computers and In stractability: A Guide to the Theory of NP-Completeness. W. H Freeman, San Fran , 1979 .