Predicting cost amortization for query services

Emerging providers of online services offer access to data collections. Such data service providers need to build data structures, e.g. materialized views and indexes, in order to offer better performance for user query execution. The cost of such structures is charged to the user as part of the overall query service cost. In order to ensure the economic viability of the provider, the building and maintenance cost of new structures has to be amortized to a set of prospective query services that will use them. This work proposes a novel stochastic model that predicts the extent of cost amortization in time and number of services. The model is completed with a novel method that regresses query traffic statistics and provides input to the prediction model. In order to demonstrate the effectiveness of the prediction model, we study its application on an extension of an existing economy model for the management of a cloud DBMS. A thorough experimental study shows that the prediction model ensures the economic viability of the cloud DBMS while enabling the offer of fast and cheap query services.

[1]  Michael Muskulus,et al.  Modeling Job Arrivals in a Data-Intensive Grid , 2006, JSSPP.

[2]  Divyakant Agrawal,et al.  Database Management as a Service: Challenges and Opportunities , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[3]  Dirk Grunwald,et al.  Next cache line and set prediction , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[4]  Andreas Rosenblad,et al.  J.J. Faraway: Extending the linear model with R: generalized linear, mixed effects and nonparametric regression models , 2009, Comput. Stat..

[5]  D. Cox,et al.  Analysis of Survival Data. , 1986 .

[6]  Stanley B. Zdonik,et al.  Correlation Maps: A Compressed Access Method for Exploiting Soft Functional Dependencies , 2009, Proc. VLDB Endow..

[7]  Hui Li,et al.  Workload Characteristics of a Multi-cluster Supercomputer , 2004, JSSPP.

[8]  Hakan Hacigümüs,et al.  Providing database as a service , 2002, Proceedings 18th International Conference on Data Engineering.

[9]  Verena Kantere,et al.  An Economic Model for Self-Tuned Cloud Caching , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[10]  Ambuj K. Singh,et al.  Modeling high-dimensional index structures using sampling , 2001, SIGMOD '01.

[11]  Yan Solihin,et al.  Counter-Based Cache Replacement and Bypassing Algorithms , 2008, IEEE Transactions on Computers.

[12]  Gustavo Alonso,et al.  Predictable Performance for Unpredictable Workloads , 2009, Proc. VLDB Endow..

[13]  Xiaodan Wang,et al.  A Workload-Driven Unit of Cache Replacement for Mid-Tier Database Caching , 2007, DASFAA.

[14]  Dror G. Feitelson,et al.  Locality of sampling and diversity in parallel system workloads , 2007, ICS '07.

[15]  Kenneth Dixon,et al.  Introduction to Stochastic Modeling , 2011 .

[16]  Jayant R. Haritsa,et al.  Plan Selection Based on Query Clustering , 2002, VLDB.

[17]  C. Gallagher Extending the Linear Model With R: Generalized Linear, Mixed Effects and Nonparametric Regression Models , 2007 .

[18]  Hakan Ferhatosmanoglu,et al.  Online Index Recommendations for High-Dimensional Databases Using Query Workloads , 2008, IEEE Transactions on Knowledge and Data Engineering.

[19]  Nicolas Bruno,et al.  Configuration-parametric query optimization for physical design tuning , 2008, SIGMOD Conference.

[20]  Xiaodan Wang,et al.  Automated physical design in database caches , 2008, 2008 IEEE 24th International Conference on Data Engineering Workshop.

[21]  Ambuj Tewari,et al.  Optimistic Linear Programming gives Logarithmic Regret for Irreducible MDPs , 2007, NIPS.

[22]  Anastasia Ailamaki,et al.  Efficient Use of the Query Optimizer for Automated Database Design , 2007, VLDB.

[23]  Randal C. Burns,et al.  Bypass caching: making scientific databases good network citizens , 2005, 21st International Conference on Data Engineering (ICDE'05).

[24]  B Praveen Kumar,et al.  Mariposa a Wide-Area Distributed Database System , 2010, ICCA 2010.

[25]  M. Martonosi,et al.  Timekeeping in the memory system: predicting and optimizing memory behavior , 2002, Proceedings 29th Annual International Symposium on Computer Architecture.

[26]  Anastasia Ailamaki,et al.  Efficient use of the query optimizer for automated physical design , 2007, VLDB 2007.

[27]  Paul Brown,et al.  CORDS: automatic discovery of correlations and soft functional dependencies , 2004, SIGMOD '04.

[28]  Babak Falsafi,et al.  Dead-block prediction & dead-block correlating prefetchers , 2001, ISCA 2001.

[29]  Aniruddha R. Thakar,et al.  When Database Systems Meet the Grid , 2005, CIDR.

[30]  Michael Muskulus,et al.  Modeling correlated workloads by combining model based clustering and a localized sampling algorithm , 2007, ICS '07.

[31]  J. Faraway Extending the Linear Model with R: Generalized Linear, Mixed Effects and Nonparametric Regression Models , 2005 .

[32]  Surajit Chaudhuri,et al.  An Efficient Cost-Driven Index Selection Tool for Microsoft SQL Server , 1997, VLDB.

[33]  Tran Ngoc Minh,et al.  Modeling Job Arrival Process with Long Range Dependence and Burstiness Characteristics , 2009, 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid.

[34]  Tran Ngoc Minh,et al.  Modeling Parallel System Workloads with Temporal Locality , 2009, JSSPP.