An Economical Query Cost Model in the Cloud

The Cloud Computing $\mathcal{C}\mathcal{C}$ brings a new approach of information technology IT consumption and is changing the investment manner of enterprises and companies. While the reputation of $\mathcal{C}\mathcal{C}$ is increasing, a large number of applications managing large amount of data are moving towards the Cloud which incorporates several new dimension: payment, query processing, etc.. The analytical data management applications are an example of those applications. They are intended to the decision support process requiring complex queries. To optimize these queries, optimization structures such as indexes and materialized views are required. In the traditional database infrastructures centralized, parallel, distributed, etc., the choice of the optimal configuration of optimization structures is usually guided by mathematical cost models. They are used to quantify the quality of the obtained solutions. The purpose of our work is to develop a cost model to select materialized views in the Cloud. The main characteristic of our cost model is that it considers the payment cost and the query processing paradigm. Intensive experiments were conducted using our cost model and the obtained results are deployed in an assimilated Cloud infrastructure.

[1]  Il-Yeol Song,et al.  Relational versus non-relational database systems for data warehousing , 2010, DOLAP '10.

[2]  Ling Feng,et al.  Optimized Design of Materialized Views in a Real-Life Data Warehousing Environment , 2001 .

[3]  Laurent d'Orazio,et al.  Cost models for view materialization in the cloud , 2012, EDBT-ICDT '12.

[4]  Zohra Bellahsene,et al.  Selection of Materialized Views: a Cost-Based Approach , 2003, BDA.

[5]  Pete Wyckoff,et al.  Hive - A Warehousing Solution Over a Map-Reduce Framework , 2009, Proc. VLDB Endow..

[6]  Jérôme Darmont,et al.  Clustering-Based Materialized View Selection in Data Warehouses , 2006, ADBIS.

[7]  Margaret H. Dunham,et al.  Join processing in relational databases , 1992, CSUR.

[8]  Jian Yang,et al.  Algorithms for Materialized View Design in Data Warehousing Environment , 1997, VLDB.

[9]  W. H. Inmon,et al.  Building the data warehouse , 1992 .

[10]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[11]  Pascal Richard,et al.  Referential Horizontal Partitioning Selection Problem in Data Warehouses: Hardness Study and Selection Algorithms , 2009, Int. J. Data Warehous. Min..

[12]  Hu Yong,et al.  Incremental recomputations in MapReduce , 2011, CloudDB '11.

[13]  Howard Gobioff,et al.  The Google file system , 2003, SOSP '03.

[14]  Soumia Benkrid,et al.  A Joint Design Approach of Partitioning and Allocation in Parallel Data Warehouses , 2009, DaWaK.

[15]  Michael Stonebraker,et al.  MapReduce and parallel DBMSs: friends or foes? , 2010, CACM.

[16]  Daniel J. Abadi,et al.  Data Management in the Cloud: Limitations and Opportunities , 2009, IEEE Data Eng. Bull..

[17]  Alfredo Cuzzocrea,et al.  F&A: A Methodology for Effectively and Efficiently Designing Parallel Relational Data Warehouses on Heterogenous Database Clusters , 2010, DaWak.

[18]  Wolfgang Lehner,et al.  On solving the view selection problem in distributed data warehouse architectures , 2003, 15th International Conference on Scientific and Statistical Database Management, 2003..

[19]  Hyeonsang Eom,et al.  Scatter-Gather-Merge: An efficient star-join query processing algorithm for data-parallel frameworks , 2011, Cluster Computing.

[20]  Michael Stonebraker,et al.  A comparison of approaches to large-scale data analysis , 2009, SIGMOD Conference.

[21]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[22]  Jérôme Darmont,et al.  Data mining-based materialized view and index selection in data warehouses , 2007, Journal of Intelligent Information Systems.

[23]  Matteo Golfarelli,et al.  A methodological framework for data warehouse design , 1998, DOLAP '98.