Optimizing Aggregate Query Processing in Cloud Data Warehouses

In this paper, we study and optimize the aggregate query processing in a highly distributed Cloud Data Warehouse, where each database stores a subset of relational data in a star-schema. Existing aggregate query processing algorithms focus on optimizing various query operations but give less importance to communication cost overhead (Two-phase algorithm). However, in cloud architectures, the communication cost overhead is an important factor in query processing. Thus, we consider communication overhead to improve the distributed query processing in such cloud data warehouses. We then design query-processing algorithms by analyzing aggregate operation and eliminating most of the sort and group-by operations with the help of integrity constraints and our proposed storage structures, PK-map and Tuple-index-map. Extensive experiments on PlanetLab cloud machines validate the effectiveness of our proposed framework in improving the response time, reducing node-to-node interdependency, minimizing communication overhead, and reducing database table access required for aggregate query.

[1]  Carlo Curino,et al.  Relational Cloud: a Database Service for the cloud , 2011, CIDR.

[2]  Garcia-MolinaH.,et al.  Main Memory Database Systems , 1992 .

[3]  Hasso Plattner,et al.  A common database approach for OLTP and OLAP using an in-memory column database , 2009, SIGMOD Conference.

[4]  Beng Chin Ooi,et al.  Llama: leveraging columnar storage for scalable join processing in the MapReduce framework , 2011, SIGMOD '11.

[5]  Kyuseok Shim,et al.  Including Group-By in Query Optimization , 1994, VLDB.

[6]  Goetz Graefe,et al.  New algorithms for join and grouping operations , 2012, Computer Science - Research and Development.

[7]  Jin Chen,et al.  Dynamic Resource Allocation for Database Servers Running on Virtual Storage , 2009, FAST.

[8]  Guido Moerkotte,et al.  A Combined Framework for Grouping and Order Optimization , 2004, VLDB.

[9]  Xiaoyu Wang,et al.  Avoiding sorting and grouping in processing queries , 2003, VLDB 2003.

[10]  Per-Åke Larson,et al.  Eager Aggregation and Lazy Aggregation , 1995, VLDB.

[11]  Alfons Kemper,et al.  HyPer: A hybrid OLTP&OLAP main memory database system based on virtual memory snapshots , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[12]  Hiroyuki Kitagawa,et al.  Efficient Management of Multidimensional Data in Structured Peer-to-peer Overlays , 2009, VLDB PhD Workshop.

[13]  Hua-Gang Li,et al.  Adaptive and Big Data Scale Parallel Execution in Oracle , 2013, Proc. VLDB Endow..

[14]  Benyuan Liu,et al.  Communication cost optimization for cloud Data Warehouse queries , 2012, 4th IEEE International Conference on Cloud Computing Technology and Science Proceedings.

[15]  Donald Kossmann,et al.  The state of the art in distributed query processing , 2000, CSUR.

[16]  Larry L. Peterson,et al.  The design principles of PlanetLab , 2006, OPSR.

[17]  Yu Cao,et al.  Sort-sharing-aware query processing , 2012, The VLDB Journal.

[18]  Jiawei Han,et al.  Star-Cubing: Computing Iceberg Cubes by Top-Down and Bottom-Up Integration , 2003, Very Large Data Bases Conference.

[19]  Rajeev Motwani,et al.  Coloring Away Communication in Parallel Query Optimization , 1995, VLDB.

[20]  Christos Doulkeridis,et al.  Peer-to-Peer Query Processing over Multidimensional Data , 2012, SpringerBriefs in Computer Science.

[21]  Hector Garcia-Molina,et al.  Main Memory Database Systems: An Overview , 1992, IEEE Trans. Knowl. Data Eng..

[22]  Eugene J. Shekita,et al.  Fundamental techniques for order optimization , 1996, SIGMOD '96.