Design and evaluation of smart disk architecture for DSS commercial workloads

The requirements for storage space and computational power of large-scale applications are increasing rapidly. Clusters seem to be the most attractive architecture for such applications, due to their low costs and high scalability. On the other hand, smart disk systems, with their large storage capacities and growing computational power are becoming increasingly popular. In this work, we compare the performance of these architectures with a single host-based system using representative queries from the Decision Support System (DSS) databases. We show how to implement individual database operations in the smart disk system and also show how to optimize the execution of the whole query by bundling frequently occurring operations together and executing the bundle in a single invocation. Besides decreasing the overall execution time, operation bundling also offers an easy-to-program and easy-to-use interface to access the data on smart disks. We also present a protocol for minimizing the communication time in the smart disk based system. To measure the response times, we have developed the DBsim, an accurate simulator which can simulate the database operations for the single host-based, cluster-based and smart disk based systems. Using this simulator; we illustrate that the smart disk architecture offers substantial benefits in terms of overall query execution times of the TPC-D benchmark suite. In particular, the average response time of the smart disk architecture for the representative queries from the TPC-D benchmark in our base configuration is 71% smaller than the response time on the single host-based system and 4.2% smaller than the response time on the fastest cluster architecture. We also demonstrate the effectiveness of the operation bundling.

[1]  Jim Zelenka,et al.  File server scaling with network-attached secure disks , 1997, SIGMETRICS '97.

[2]  Kenneth L. Calvert,et al.  Implementation of an Active Networking Architecture , 1996 .

[3]  Mahmut Kandemir,et al.  An Experimental Evaluation of Smart Disk Architectures Using DSS Commercial Workloads , 1999 .

[4]  Garth A. Gibson,et al.  Active Disks: Remote Execution for Network-Attached Storage (CMU-CS-97-198) , 1997 .

[5]  Kenneth C. Smith,et al.  RAP: an associative processor for data base management , 1975, AFIPS '75.

[6]  Raghu Ramakrishnan,et al.  Database Management Systems , 1976 .

[7]  Chyuan Shiun Lin,et al.  The design of a rotating associative memory for relational database applications , 1976, TODS.

[8]  Edward Babb,et al.  Implementing a relational database by means of specialzed hardware , 1979, TODS.

[9]  David A. Patterson,et al.  A case for intelligent disks (IDISKs) , 1998, SGMD.

[10]  David J. DeWitt,et al.  Managing Intra-operator Parallelism in Parallel Database Systems , 1995, VLDB.

[11]  Gregory R. Ganger,et al.  The DiskSim Simulation Environment Version 4.0 Reference Manual (CMU-PDL-08-101) , 1998 .

[12]  Noah Treuhaft,et al.  Cluster I/O with River: making the fast case common , 1999, IOPADS '99.

[13]  Timos K. Sellis,et al.  Parametric query optimization , 1992, The VLDB Journal.

[14]  C. R. Atanasio Design and implementation of a re-coverable virtual shared disk , 1994 .

[15]  David A. Patterson,et al.  Virtual log based file systems for a programmable disk , 1999, OSDI '99.

[16]  Waqar Hasan,et al.  Optimization of SQL Queries for Parallel Machines , 1996, Lecture Notes in Computer Science.

[17]  John Wilkes,et al.  An introduction to disk drive modeling , 1994, Computer.

[18]  Hai Jin,et al.  Active Disks: Programming Model, Algorithms and Evaluation , 2002 .

[19]  Jayanta Banerjee,et al.  DBC—A Database Computer for Very Large Databases , 1979, IEEE Transactions on Computers.

[20]  GraefeGoetz Query evaluation techniques for large databases , 1993 .

[21]  Joel H. Saltz,et al.  Evaluation of active disks for decision support databases , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).

[22]  Andrea C. Arpaci-Dusseau,et al.  High-performance sorting on networks of workstations , 1997, SIGMOD '97.

[23]  Joel H. Saltz,et al.  Structure and Performance of Decision Support Algorithms on Active Disks , 1998 .

[24]  Rajeev Motwani,et al.  Coloring Away Communication in Parallel Query Optimization , 1995, VLDB.

[25]  David M. Murphy,et al.  Building an Active Node on the Internet , 1997 .

[26]  Darrell D. E. Long,et al.  Swift/RAID: A Distributed RAID System , 1994, Comput. Syst..

[27]  Gregory G. Finn,et al.  Derived virtual devices: a secure distributed file system mechanism , 1996 .

[28]  Goetz Graefe,et al.  Query evaluation techniques for large databases , 1993, CSUR.

[29]  Wilson C. Hsieh,et al.  The logical disk: a new approach to improving file systems , 1994, SOSP '93.

[30]  Joel H. Saltz,et al.  Active disks: programming model, algorithms and evaluation , 1998, ASPLOS VIII.

[31]  Michael Stonebraker,et al.  Mariposa: a wide-area distributed database system , 1996, The VLDB Journal.

[32]  Josep Torrellas,et al.  The memory performance of DSS commercial workloads in shared-memory multiprocessors , 1997, Proceedings Third International Symposium on High-Performance Computer Architecture.

[33]  John H. Hartman,et al.  The Zebra striped network file system , 1995, TOCS.

[34]  David J. DeWitt,et al.  Parallel database systems: the future of high performance database systems , 1992, CACM.

[35]  John H. Hartman,et al.  Liquid Software: A New Paradigm for Networked Systems , 1996 .

[36]  Srinivasan Seshan,et al.  RAID-II: a high-bandwidth network file server , 1994, ISCA '94.

[37]  Christos Faloutsos,et al.  Active Storage for Large-Scale Data Mining and Multimedia , 1998, VLDB.