论文信息 - Efficient OLAP with UDFs

Efficient OLAP with UDFs

Since the early 1990s, On-Line Analytical Processing (OLAP) has been a well studied research topic that has focused on implementation outside the database, either with OLAP servers or entirely within the client computers. Our approach involves the computation and storage of OLAP cubes using User-Defined Functions (UDF) with a database management system. UDFs offer users a chance to write their own code that can then called like any other standard SQL function. By generating OLAP cubes within a UDF, we are able to create the entire lattice in main memory. The UDF also allows the user to assert more control over the actual generation process than when using standard OLAP functions such as the CUBE operator. We introduce a data structure that can not only efficiently create an OLAP lattice in main memory, but also be adapted to generate association rule itemsets with minimal change. We experimentally show that the UDF approach is more efficient than SQL using one real dataset and a synthetic dataset. Also, we present several experiments showing that generating association rule itemsets using the UDF approach is comparable to a SQL approach. In this paper, we show that techniques such as OLAP and association rules can be efficiently pushed into the UDF, and has better performance, in most cases, compared to standard SQL functions.

Carlos Ordonez | Zhibo Chen

[1] Surajit Chaudhuri,et al. An overview of data warehousing and OLAP technology , 1997, SGMD.

[2] Sunita Sarawagi,et al. Integrating Association Rule Mining with Relational Database Systems: Alternatives and Implications , 1998, SIGMOD '98.

[3] Bill Hamilton. Programming SQL Server 2005 , 2006 .

[4] Carlos Ordonez,et al. Vector and matrix operations programmed with UDFs in a relational DBMS , 2006, CIKM '06.

[5] Michael L. Heytens,et al. NonStop SQL/MX primitives for knowledge discovery , 1999, KDD '99.

[6] Carlos Ordonez,et al. Evaluating Statistical Tests on OLAP Cubes to Compare Degree of Disease , 2009, IEEE Transactions on Information Technology in Biomedicine.

[7] Joachim Hammer,et al. CubiST: a new algorithm for improving the performance of ad-hoc OLAP queries , 2000, DOLAP '00.

[8] Tomasz Imielinski,et al. Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[9] Yannis Sismanis,et al. Dwarf: shrinking the PetaCube , 2002, SIGMOD '02.

[10] Carlos Ordonez. Building statistical models and scoring with UDFs , 2007, SIGMOD '07.

[11] Sabine Loudcher,et al. Enhanced mining of association rules from data cubes , 2006, DOLAP '06.

[12] Jian Pei,et al. Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[13] Norberto F. Ezquerra,et al. Constraining and summarizing association rules in medical data , 2006, Knowledge and Information Systems.

[14] Petra Perner,et al. Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[15] Christian Hidber,et al. Association Rule Mining , 2017 .

[16] Carlos Ordonez. Vertical and horizontal percentage aggregations , 2004, SIGMOD '04.