Reducing server data traffic using a hierarchical computation model

Commercial workloads impose heavy demands on memory and storage subsystems in a server and often result in a large amount of traffic in I/O and memory buses. To reduce the data movement between the storage subsystem and the processing units, we propose a hierarchical computing (HC) system that distributes processing elements across the storage hierarchy. We present a programming model that allows us to decompose database queries into simple operations. These operations are then distributed and executed by the different layers of the hierarchy depending on the affinity of the task to a particular layer. Commands percolate down into the lower layers of the hierarchy and partially processed information flows up into the higher layers, where subsequent operations can be performed. We evaluate the effectiveness of the proposed hierarchical computing model by performing full system simulations of a business decision support system (DSS) workload. On a group of TPC-H-like queries, hierarchical computing systems reduce the amount of data transferred over the processor to memory interconnect by 37-58 percent. We also observe that HC configurations show speedups between 1.14x and 1.45x when compared with CC-NUMA with 32 processors.

[1]  Renato J. O. Figueiredo,et al.  Impact of heterogeneity on DSM performance , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).

[2]  Ali R. Hurson,et al.  Parallel Architectures for Data/Knowledge-Based Systems , 1995 .

[3]  Duncan G. Elliott,et al.  Computational Ram: A Memory-simd Hybrid And Its Application To Dsp , 1992, 1992 Proceedings of the IEEE Custom Integrated Circuits Conference.

[4]  Virgílio A. F. Almeida,et al.  Cost-performance analysis of heterogeneity in supercomputer architectures , 1990, Proceedings SUPERCOMPUTING '90.

[5]  K. Gharachorloo,et al.  Architecture and design of AlphaServer GS320 , 2000, ASPLOS IX.

[6]  José A. B. Fortes,et al.  A heterogeneous hierarchical solution to cost-efficient high performance computing , 1996, Proceedings of SPDP '96: 8th IEEE Symposium on Parallel and Distributed Processing.

[7]  Joel H. Saltz,et al.  Active disks: programming model, algorithms and evaluation , 1998, ASPLOS VIII.

[8]  Arvind,et al.  Executing a Program on the MIT Tagged-Token Dataflow Architecture , 1990, IEEE Trans. Computers.

[9]  Ann Marie Grizzaffi Maynard,et al.  Contrasting characteristics and cache performance of technical and multi-user commercial workloads , 1994, ASPLOS VI.

[10]  David A. Patterson,et al.  A case for intelligent disks (IDISKs) , 1998, SGMD.

[11]  Peter Bodorik,et al.  Dynamic distributed query processing techniques , 1989, CSC '89.

[12]  Seth Copen Goldstein,et al.  Active messages: a mechanism for integrating communication and computation , 1998, ISCA '98.

[13]  G. Jack Lipovski,et al.  A four megabit Dynamic Systolic Associative Memory chip , 1992, J. VLSI Signal Process..

[14]  Z. B. Miled On the Cost-efficiency of Hierarchical Heterogeneous Machines for Compiler- and Hand-Parallelized Applications , 1998 .

[15]  Alan Jay Smith,et al.  Characteristics of production database workloads and the TPC benchmarks , 2001, IBM Syst. J..

[16]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[17]  Mahmut T. Kandemir,et al.  Design and Evaluation of a Smart Disk Cluster for DSS Commercial Workloads , 2001, J. Parallel Distributed Comput..

[18]  Frederic T. Chong,et al.  Active pages: a computation model for intelligent memory , 1998, ISCA.

[19]  Compaq AlphaServer Alphaserver Gs80, Gs160, and Gs320 Systems Technical Summary Contents , .

[20]  Ramesh C. Agarwal,et al.  A super scalar sort algorithm for RISC processors , 1996, SIGMOD '96.

[21]  Seth Copen Goldstein,et al.  Active Messages: A Mechanism for Integrated Communication and Computation , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.

[22]  Alan Jay Smith,et al.  I/O reference behavior of production database workloads and the TPC benchmarks—an analysis at the logical level , 1999, TODS.

[23]  Katherine Yelick,et al.  A Case for Intelligent RAM: IRAM , 1997 .

[24]  David J. DeWitt,et al.  DBMSs on a Modern Processor: Where Does Time Go? , 1999, VLDB.

[25]  William J. Dally,et al.  Smart Memories: a modular reconfigurable architecture , 2000, ISCA '00.

[26]  Christoforos E. Kozyrakis,et al.  A case for intelligent RAM , 1997, IEEE Micro.

[27]  Christos Faloutsos,et al.  Active Storage for Large-Scale Data Mining and Multimedia , 1998, VLDB.

[28]  Jaewook Shin,et al.  Mapping Irregular Applications to DIVA, a PIM-based Data-Intensive Architecture , 1999, ACM/IEEE SC 1999 Conference (SC'99).

[29]  Maged M. Michael,et al.  Coherence Controller Architectures For Smp-based Cc-numa Multiprocessors , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[30]  Dean Daniels,et al.  An Introduction to Distributed Query Compilation in R* , 1982, DDB.

[31]  P. Altena,et al.  In search of clusters , 2007 .

[32]  Luiz André Barroso,et al.  Memory system characterization of commercial workloads , 1998, ISCA.