A First Attempt to Computing Generic Set Partitions: Delegation to an SQL Query Engine

Partitions are a very common and useful way of organizing data, in data engineering and data mining. However, partitions currently lack efficient and generic data management functionalities. This paper proposes advances in the understanding of this problem, as well as elements for solving it. We formulate the task as efficient processing, evaluating and optimizing queries over set partitions, in the setting of relational databases. We first demonstrate that there is no trivial relational modeling for managing collections of partitions. We formally motivate a relational encoding and show that one cannot express all the operators of the partition lattice and set-theoretic operations as queries of the relational algebra. We provide multiple evidence of the inefficiency of FO queries. Our experimental results enforce this evidence. We claim that there is a strong requirement for the design of a dedicated system to manage set partitions, or at least to supplement an existing data management system, to which both data persistence and query processing could be delegated.

[1]  Loe M. G. Feijs,et al.  Relation Partition Algebra - Mathematical Aspects of Uses and Part-Of Relations , 1999, Sci. Comput. Program..

[2]  Günter von Bültzingsloewen Optimizing SQL queries for parallel execution , 1989, SGMD.

[3]  Guido Moerkotte,et al.  Accelerating queries with group-by and join by groupjoin , 2011, Proc. VLDB Endow..

[4]  Andrey V. Malishevski Path independence in serial—parallel data processing , 1994 .

[5]  E. F. CODD,et al.  A relational model of data for large shared data banks , 1970, CACM.

[6]  Mark E. J. Newman,et al.  Power-Law Distributions in Empirical Data , 2007, SIAM Rev..

[7]  Carlos Ordonez,et al.  Optimization of Linear Recursive Queries in SQL , 2010, IEEE Transactions on Knowledge and Data Engineering.

[8]  Sihem Amer-Yahia,et al.  A comprehensive solution to the XML-to-relational mapping problem , 2004, WIDM '04.

[9]  Limsoon Wong,et al.  Local properties of query languages , 2000, Theor. Comput. Sci..

[10]  Thomas L. Griffiths,et al.  Producing Power-Law Distributions and Damping Word Frequencies with Two-Stage Language Models , 2011, J. Mach. Learn. Res..

[11]  Grigoris Karvounarakis,et al.  Semiring-annotated data: queries and provenance? , 2012, SGMD.

[12]  Myoung-Ho Kim,et al.  Finding an efficient rewriting of OLAP queries using materialized views in data warehouses , 2002, Decis. Support Syst..

[13]  Laks V. S. Lakshmanan,et al.  A Foundation for Multi-dimensional Databases , 1997, VLDB.

[14]  Theodore Johnson,et al.  The MD-join: an operator for complex OLAP , 2001, Proceedings 17th International Conference on Data Engineering.

[15]  Wolfgang Keller Mapping Objects to Tables A Pattern Language , 1997 .

[16]  Michael Stonebraker,et al.  Future Directions in DBMS Research - The Laguna Beach Participants , 1989, SGMD.

[17]  Alberto O. Mendelzon,et al.  Concise descriptions of subsets of structured sets , 2005, TODS.

[18]  Terry A. Halpin,et al.  Information modeling and relational databases (2. ed.) , 2008 .

[19]  Michael J. Carey,et al.  XPERANTO: Middleware for Publishing Object-Relational Data as XML Documents , 2000, VLDB.