Surveying the landscape: an in-depth analysis of spatial database workloads

Spatial databases are increasingly important for a wide variety of real-world applications, such as land surveying, urban planning, cartography and location-based services. However, spatial database workload properties are not well-understood. For example, it is unknown to what degree one spatial application resembles another in terms of resource demand, or how the demand will change as more concurrent queries (i.e., more users) are added. We show that spatial workloads have a different CPU execution profile than well-studied decision support workloads, as represented by TPC-H. We present a framework to automatically classify spatial queries and characterize spatial workload mixes. We first analyze the resource consumption (i.e., computation and I/O) of a representative set of spatial queries, which are then classified into five distinct categories. Next, we create five homogeneous spatial workloads, each composed of queries from one of these classes. We then vary database-specific parameters (e.g., the buffer pool size) and workload specific parameters (e.g., the query mix), to characterize a workload in terms of CPU utilization and I/O activity trends. We study workloads simulating real-world spatial database applications and show how our framework can classify them and predict resource utilization trends under various settings. This can provide clues to the database administrator regarding which resources are heavily contended and can guide resource upgrades. We further validate our approach by applying it to a much larger dataset, and to a second DBMS.

[1]  Mariacarla Calzarossa,et al.  Workload Characterization Issues and Methodologies , 2000, Performance Evaluation.

[2]  Philip S. Yu,et al.  On Workload Characterization of Relational Database Environments , 1992, IEEE Trans. Software Eng..

[3]  Susan J. Eggers,et al.  An analysis of database workload performance on simultaneous multithreaded processors , 1998, ISCA.

[4]  Sang Ho Lee,et al.  Resource Selection for Autonomic Database Tuning , 2005, 21st International Conference on Data Engineering Workshops (ICDEW'05).

[5]  Jin Chen,et al.  Dynamic Resource Allocation for Database Servers Running on Virtual Storage , 2009, FAST.

[6]  Suprio Ray,et al.  Jackpine: A benchmark to evaluate spatial database performance , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[7]  David A. Patterson,et al.  Performance characterization of a Quad Pentium Pro SMP using OLTP workloads , 1998, ISCA.

[8]  David J. DeWitt,et al.  DBMSs on a Modern Processor: Where Does Time Go? , 1999, VLDB.

[9]  Said Elnaffar,et al.  Automatically classifying database workloads , 2002, CIKM '02.

[10]  Alan Jay Smith,et al.  Characteristics of production database workloads and the TPC benchmarks , 2001, IBM Syst. J..

[11]  Philip S. Yu,et al.  Characterization of database access pattern for analytic prediction of buffer hit probability , 2005, The VLDB Journal.

[12]  Martin F. Arlitt,et al.  Characterizing Web user sessions , 2000, PERV.

[13]  W. D. Ray Applied Linear Statistical Models (3rd Edition) , 1991 .

[14]  Carsten Sapia,et al.  PROMISE: Predicting Query Behavior to Enable Predictive Caching Strategies for OLAP Systems , 2000, DaWaK.

[15]  David B. Skillicorn,et al.  Developing a characterization of business intelligence workloads for sizing new database systems , 2004, DOLAP '04.

[16]  Giuseppe Serazzi,et al.  Workload characterization: a survey , 1993, Proc. IEEE.

[17]  V. Barnett,et al.  Applied Linear Statistical Models , 1975 .

[18]  Gerhard Weikum,et al.  Self-tuning Database Technology and Information Services: from Wishful Thinking to Viable Engineering , 2002, VLDB.

[19]  Lieven Eeckhout,et al.  How java programs interact with virtual machines at the microarchitectural level , 2003, OOPSLA.

[20]  Luiz André Barroso,et al.  Memory system characterization of commercial workloads , 1998, ISCA.

[21]  Kimmo E. E. Raatikainen,et al.  Cluster analysis and workload classification , 1993, PERV.

[22]  Said Elnaffar,et al.  Towards workload-aware dbmss: identifying workload type and predicting its change , 2004 .