Processor-embedded distributed smart disks for I/O-intensive workloads: architectures, performance models and evaluation

Processor-embedded disks, or smart disks, with their network interface controller, can in effect be viewed as processing elements with on-disk memory and secondary storage. The data sizes and access patterns of today's large I/O-intensive workloads require architectures whose processing power scales with increased storage capacity. To address this concern, we propose and evaluate disk-based distributed smart storage architectures. Based on analytically derived performance models, our evaluation with representative workloads show that offloading processing and performing point-to-point data communication improve performance over centralized architectures. Our results also demonstrate that distributed smart disk systems exhibit desirable scalability and can efficiently handle I/O-intensive workloads, such as commercial decision support database (TPC-H) queries, association rules mining, data clustering, and two-dimensional fast Fourier transform, among others.

[1]  Mahmut T. Kandemir,et al.  Improving Locality in Out-of-Core Computations Using Data Layout Transformations , 1998, LCR.

[2]  Mahmut T. Kandemir,et al.  Design and evaluation of smart disk architecture for DSS commercial workloads , 2000, Proceedings 2000 International Conference on Parallel Processing.

[3]  Joel H. Saltz,et al.  Active disks: programming model, algorithms and evaluation , 1998, ASPLOS VIII.

[4]  George Karypis,et al.  Evaluation of hierarchical clustering algorithms for document datasets , 2002, CIKM '02.

[5]  David J. DeWitt,et al.  Parallel database systems: the future of high performance database systems , 1992, CACM.

[6]  Christos Faloutsos,et al.  Active Disk Architecture for Databases (CMU-CS-00-145) , 2000 .

[7]  Jonathan Schaeffer,et al.  On the Versatility of Parallel Sorting by Regular Sampling , 1993, Parallel Comput..

[8]  Christoforos E. Kozyrakis,et al.  A case for intelligent RAM , 1997, IEEE Micro.

[9]  Alan Jay Smith,et al.  Projecting the performance of decision support workloads on systems with smart storage (SmartSTOR) , 1999, Proceedings Seventh International Conference on Parallel and Distributed Systems (Cat. No.PR00568).

[10]  Fabrizio Silvestri,et al.  An Efficient Parallel and Distributed Algorithm for Counting Frequent Sets , 2002, VECPAR.

[11]  Christos Faloutsos,et al.  Active Disk Architecture for Databases , 2000 .

[12]  Pradeep K. Khosla,et al.  Survivable Information Storage Systems , 2000, Computer.

[13]  Joel H. Saltz,et al.  Evaluation of active disks for decision support databases , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).

[14]  Dean P. Foster,et al.  Clustering Methods for Collaborative Filtering , 1998, AAAI 1998.

[15]  D. B. Davis,et al.  Intel Corp. , 1993 .

[16]  Katherine Yelick,et al.  A Case for Intelligent RAM: IRAM , 1997 .

[17]  Rodney Van Meter,et al.  Network attached storage architecture , 2000, CACM.

[18]  Andreas Reuter,et al.  Transaction Processing: Concepts and Techniques , 1992 .

[19]  Rakesh Agrawal,et al.  Parallel Mining of Association Rules , 1996, IEEE Trans. Knowl. Data Eng..

[20]  Pierre America,et al.  Parallel Database Systems , 1991 .

[21]  David A. Patterson,et al.  A case for intelligent disks (IDISKs) , 1998, SGMD.

[22]  Vipin Kumar,et al.  Scalable Parallel Data Mining for Association Rules , 2000, IEEE Trans. Knowl. Data Eng..

[23]  Andrea C. Arpaci-Dusseau,et al.  Semantically-Smart Disk Systems , 2003, FAST.