Decision-support workload characteristics on a clustered database server from the OS perspective

A range of database services are being offered on clusters of workstations today to meet the demanding needs of applications with voluminous datasets, high computational and I/O requirements and a large number of users. The underlying database engine runs on cost-effective off-the-shelf hardware and software components that may not really be tailored/tuned for these applications. At the same time, many of these databases have legacy codes that may not be easy to modulate based on the evolving capabilities and limitations of clusters. An indepth understanding of the interaction between these database engines and the underlying operating system (OS) can identify a set of characteristics that would be extremely valuable for future research on systems support for these environments. To our knowledge, there is no prior work that has embarked on such a characterization for a clustered database server. Using IBM DB2 Universal Database (UDB) Extended Enterprise Edition (EEE) V7.2 Trial version and TPC-H like/sup 1/ decision support queries, this paper studies numerous issues by evaluating performance on an off-the-shelf Pentium/Linux cluster connected by Myrinet. These include detailed performance profiles of all kernel activities, as well as qualitative and quantitative insights on the interaction between the database engine and the operating system.

[1]  Meenakshi A. Kandaswamy,et al.  I/O phase characterization of TPC-H query operations , 2000, Proceedings IEEE International Computer Performance and Dependability Symposium. IPDS 2000.

[2]  Willy Zwaenepoel,et al.  IO-Lite: a unified I/O buffering and caching system , 1999, TOCS.

[3]  Amin Vahdat,et al.  GLUix: a global layer unix for a network of workstations , 1998 .

[4]  Irving L. Traiger Virtual memory management for database systems , 1982, OPSR.

[5]  Michael Stonebraker,et al.  Operating system support for database management , 1981, CACM.

[6]  Yousef A. Khalidi,et al.  An Efficient Zero-Copy I/O Framework for UNIX , 1995 .

[7]  Alan Jay Smith,et al.  I/O reference behavior of production database workloads and the TPC benchmarks—an analysis at the logical level , 1999, TODS.

[8]  Alan Jay Smith,et al.  Characteristics of production database workloads and the TPC benchmarks , 2001, IBM Syst. J..

[9]  Jim Gray,et al.  Notes on Data Base Operating Systems , 1978, Advanced Course: Operating Systems.

[10]  Alan Jay Smith,et al.  Analysis of the Characteristics of Production Database Workloads and Comparison with the TPC Benchmarks , 1999 .

[11]  Larry L. Peterson,et al.  Fbufs: a high-bandwidth cross-domain transfer facility , 1994, SOSP '93.

[12]  Brian Randell,et al.  Operating Systems, An Advanced Course , 1978 .

[13]  John Catozzi,et al.  Operating System Extensions for the Teradata Parallel VLDB , 2001, VLDB.

[14]  L TraigerIrving Virtual memory management for database systems , 1982 .

[15]  Amin Vahdat,et al.  GLUix: a global layer unix for a network of workstations , 1998, Softw. Pract. Exp..

[16]  Jeffrey F. Naughton,et al.  Global memory management for multi-server database systems , 1996 .

[17]  Klaus Meyer-Wegener,et al.  Which Kinds of OS Mechanisms Should be Provided for Database Management? , 1987, Experiences with Distributed Systems.