Count Constraints and the Inverse OLAP Problem: Definition, Complexity and a Step toward Aggregate Data Exchange

A typical problem in database theory is to verify whether there exists a relation (or database) instance satisfying a number of given dependency constraints. This problem has recently received a renewed deal of interest within the context of data exchange, but the issue of handling constraints on aggregate data has not been much investigated so far, notwithstanding the relevance of aggregate operations in exchange systems. This paper introduces count constraints that require the results of given count operations on a relation to be within a certain range. Count constraints are defined by a suitable extension of first order predicate calculus, based on set terms, and they are then used in a new decisional problem, the Inverse OLAP: given a star schema, does there exist a relation instance satisfying a set of given count constraints? The new problem turns out to be NEXP complete under various conditions: program complexity, data complexity and combined complexity. Count constraints can be also used into a data exchange system context, where data from the source database are transferred to the target database using aggregate operations.

[1]  Wing-Kai Hon,et al.  Generating databases for query workloads , 2010, Proc. VLDB Endow..

[2]  Gabriel M. Kuper,et al.  Structural Properties of XPath Fragments , 2003, ICDT.

[3]  Carsten Binnig,et al.  QAGen: generating query-aware test databases , 2007, SIGMOD '07.

[4]  Moshe Y. Vardi The complexity of relational query languages (Extended Abstract) , 1982, STOC '82.

[5]  Jian Li,et al.  Data generation using declarative constraints , 2011, SIGMOD '11.

[6]  Toon Calders The complexity of satisfying constraints on databases of transactions , 2007, Acta Informatica.

[7]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[8]  Tommi Syrjänen,et al.  Logic programming and cardinality constraints : theory and practice , 2009 .

[9]  Phokion G. Kolaitis,et al.  Answering aggregate queries in data exchange , 2008, PODS.

[10]  Matteo Golfarelli,et al.  Data Warehouse Design: Modern Principles and Methodologies , 2009 .

[11]  Toon Calders Computational complexity of itemset frequency satisfiability , 2004, PODS '04.

[12]  Moshe Y. Vardi,et al.  Polynomial-time implication problems for unary inclusion dependencies , 1990, JACM.

[13]  Ronald Fagin,et al.  Locally consistent transformations and query answering in data exchange , 2004, PODS '04.

[14]  Ying Wu,et al.  Privacy Aware Market Basket Data Set Generation: A Feasible Approach for Inverse Frequent Set Mining , 2005, SDM.

[15]  Surajit Chaudhuri,et al.  An overview of data warehousing and OLAP technology , 1997, SGMD.

[16]  Mihalis Yannakakis,et al.  A Note on Succinct Representations of Graphs , 1986, Inf. Control..

[17]  Maurizio Lenzerini,et al.  Data integration: a theoretical perspective , 2002, PODS.

[18]  Taneli Mielikäinen,et al.  On Inverse Frequent Set Mining , 2003 .

[19]  Christos H. Papadimitriou,et al.  Why not negation by fixpoint? , 1988, PODS '88.

[20]  Ganesh Ramesh,et al.  Feasible itemset distributions in data mining: theory and application , 2003, PODS '03.

[21]  Z. Meral Özsoyoglu,et al.  Implication and Referential Constraints: A New Formal Reasoning , 1997, IEEE Trans. Knowl. Data Eng..

[22]  Domenico Saccà,et al.  Solving inverse frequent itemset mining with infrequency constraints via large-scale linear programs , 2013, TKDD.

[23]  Ronald Fagin,et al.  Data exchange: getting to the core , 2003, PODS '03.

[24]  Riccardo Rosati On the decidability and finite controllability of query processing in databases with incomplete information , 2006, PODS '06.

[25]  Ronald Fagin,et al.  Data exchange: semantics and query answering , 2005, Theor. Comput. Sci..

[26]  Rakesh Agrawal,et al.  Privacy-preserving data mining , 2000, SIGMOD 2000.

[27]  Gerald Pfeifer,et al.  Design and implementation of aggregate functions in the DLV system* , 2008, Theory and Practice of Logic Programming.