Evaluation on Applicability of Measurement-Based Join Cost Calculation Method Using Different Generation CPUs

Cloud computing is gaining increasing attention, and users are increasingly demanding higher performance. In particular, the database-as-a-service (DaaS) market is rapidly expanding. Attempts to improve database performance have led to the use of nonvolatile memory as a durable database medium instead of existing storage devices. However, the CPU architecture of different physical servers may be of different generations. For database systems running on a cloud infrastructure, the cost of storing databases in nonvolatile memory decreases and the cost of CPU operation increases relative to choosing the most suitable execution plan for a database query. In our previous study, we proposed a measurement-based join cost calculation method to select the best join method. However, when the cost calculation method is not portable to database systems with a CPU with architecture differing from that of the CPU used to measure the statistical information, re-measuring the statistical information, and recreating the cost calculation model becomes necessary. The current study is aimed at developing a method for updating the measurement-based cost calculation formula to support a CPU with architecture from a different generation without the need to re-measure the statistical information of the CPU. Our approach focuses on reflecting architectural changes, such as cache size and associativity, memory latency, and branch misprediction penalty, in the components of the cost calculation formulas. The updated cost evaluation formulas estimated the cost of joining different generation-based CPUs accurately in 66% of the test cases. Therefore, an in-memory database system using the measurement-based cost calculation method can be applied to a database system with CPUs from different generations.

[1]  Carl Staelin,et al.  lmbench: Portable Tools for Performance Analysis , 1996, USENIX Annual Technical Conference.

[2]  Vijayalakshmi Srinivasan,et al.  On the Nature of Cache Miss Behavior: Is It √2? , 2008, J. Instr. Level Parallelism.

[3]  James E. Smith,et al.  Characterizing the branch misprediction penalty , 2006, 2006 IEEE International Symposium on Performance Analysis of Systems and Software.

[4]  Matthias S. Müller,et al.  Memory Performance and Cache Coherency Effects on an Intel Nehalem Multiprocessor System , 2009, 2009 18th International Conference on Parallel Architectures and Compilation Techniques.

[5]  Jeffrey F. Naughton,et al.  Predicting query execution time: Are optimizer cost models really unusable? , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[6]  Donald D. Chamberlin,et al.  Access Path Selection in a Relational Database Management System , 1989 .

[7]  Manos Athanassoulis,et al.  Access Path Selection in Main-Memory Optimized Data Systems: Should I Scan or Should I Probe? , 2017, SIGMOD Conference.

[8]  M. Neuts,et al.  Introduction to Queueing Theory (2nd ed.). , 1983 .

[9]  Weimin Du,et al.  Query Optimization in a Heterogeneous DBMS , 1992, VLDB.

[10]  Qiang Zhu,et al.  Building regression cost models for multidatabase systems , 1996, Fourth International Conference on Parallel and Distributed Information Systems.

[11]  Rathijit Sen,et al.  Characterizing Resource Sensitivity of Database Workloads , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[12]  Willy Zwaenepoel,et al.  Diagnosing performance overheads in the xen virtual machine environment , 2005, VEE '05.

[13]  Alan Jay Smith,et al.  Evaluating Associativity in CPU Caches , 1989, IEEE Trans. Computers.

[14]  R. Govindarajan,et al.  A Comprehensive Analytical Performance Model of DRAM Caches , 2015, ICPE.

[15]  Efraim Rotem,et al.  Inside 6th-Generation Intel Core: New Microarchitecture Code-Named Skylake , 2017, IEEE Micro.

[16]  Ismail Oukid,et al.  Memory Management Techniques for Large-Scale Persistent-Main-Memory Systems , 2017, Proc. VLDB Endow..