Accelerating DSS Workloads through Coherence Protocols

In this work, we analyze how a DSS (Decision Support System) workload can be accelerated in the case of a shared-bus shared-memory multiprocessor, by adding simple support to the classical MESI solution for the coherence protocol. The DSS workload has been set-up utilizing the TPCD benchmark on the PostgreSQL DBMS. Analysis has been performed via trace driven simulation and the operating system effects are also considered in our evaluation. We analyzed a basic four-processor and a high-end sixteen-processor machine, implementing MESI and two coherence protocols which deal with migration of processes and data: PSCR and AMSD. Results show that, even in the four processor case, PSCR outperforms the other protocol, because of the lower bus utilization due to the absence of invalidation miss when we eliminate the contribution of passive sharing. In the 16 processor case, with bus near to saturation, the gain of PSCR becomes more important and the advantage of PSCR could be quantified in a 10% relatively to the other evaluated protocol.

[1]  Veljko M. Milutinovic,et al.  Hardware approaches to cache coherence in shared-memory multiprocessors. 2 , 1994, IEEE Micro.

[2]  Anoop Gupta,et al.  Memory-reference characteristics of multiprocessor applications under MACH , 1988, SIGMETRICS '88.

[3]  Sarita V. Adve,et al.  Performance of database workloads on shared-memory systems with out-of-order processors , 1998, ASPLOS VIII.

[4]  Trevor N. Mudge,et al.  Trace-driven memory simulation: a survey , 1997, CSUR.

[5]  Alan Jay Smith,et al.  A class of compatible cache consistency protocols and their support by the IEEE futurebus , 1986, ISCA '86.

[6]  James K. Archibald,et al.  Cache coherence protocols: evaluation using a multiprocessor simulation model , 1986, TOCS.

[7]  Erik Hagersten,et al.  Trends in Shared Memory Multiprocessing , 1997, Computer.

[8]  Anoop Gupta,et al.  Performance evaluation of memory consistency models for shared-memory multiprocessors , 1991, ASPLOS IV.

[9]  John L. Hennessy,et al.  The accuracy of trace-driven simulations of multiprocessors , 1993, SIGMETRICS '93.

[10]  Anoop Gupta,et al.  Cache Invalidation Patterns in Shared-Memory Multiprocessors , 1992, IEEE Trans. Computers.

[11]  Randall L. Hyde,et al.  An Analysis of Degenerate Sharing and False Coherence , 1996, J. Parallel Distributed Comput..

[12]  Robert J. Fowler,et al.  Adaptive cache coherency for detecting migratory shared data , 1993, ISCA '93.

[13]  Mats Brorsson,et al.  An adaptive cache coherence protocol optimized for migratory sharing , 1993, ISCA '93.

[14]  Anoop Gupta,et al.  Memory-reference characteristics of multiprocessor applications under MACH , 1988, SIGMETRICS 1988.

[15]  Pierfrancesco Foglia,et al.  An algorithm for the Classification of Coherence Related Overhead in Shared-Bus Shared-Memory Multiprocessors , 2001 .

[16]  Luiz André Barroso,et al.  Memory system characterization of commercial workloads , 1998, ISCA.

[17]  W. Kent Fuchs,et al.  Address tracing for parallel machines , 1991, Computer.

[18]  Mark S. Squillante,et al.  Using Processor-Cache Affinity Information in Shared-Memory Multiprocessor Scheduling , 1993, IEEE Trans. Parallel Distributed Syst..

[19]  Susan J. Eggers,et al.  Eliminating False Sharing , 1991, ICPP.

[20]  Veljko M. Milutinovic,et al.  The word-invalidate cache coherence protocol , 1996, Microprocess. Microsystems.

[21]  David A. Patterson,et al.  Computer Architecture - A Quantitative Approach, 5th Edition , 1996 .

[22]  Josep Torrellas,et al.  The memory performance of DSS commercial workloads in shared-memory multiprocessors , 1997, Proceedings Third International Symposium on High-Performance Computer Architecture.

[23]  Susan J. Eggers,et al.  Reducing false sharing on shared memory multiprocessors through compile time data transformations , 1995, PPOPP '95.

[24]  Luigi M. Ricciardi,et al.  A Trace-Driven Simulator for Performance Evaluation of Cache-Based Multiprocessor Systems , 1995, IEEE Trans. Parallel Distributed Syst..

[25]  Veljko Milutinovic,et al.  The Cache Coherence Problem in Shared-Memory Multiprocessors: Software Solutions , 1996 .

[26]  Susan J. Eggers,et al.  An analysis of database workload performance on simultaneous multithreaded processors , 1998, ISCA.

[27]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[28]  Luigi M. Ricciardi,et al.  Trace Factory: generating workloads for trace-driven simulation of shared-bus multiprocessors , 1997, IEEE Concurrency.

[29]  Josep Torrellas,et al.  False Sharing ans Spatial Locality in Multiprocessor Caches , 1994, IEEE Trans. Computers.

[30]  Luigi M. Ricciardi,et al.  A hybrid approach to trace generation for performance evaluation of shared-bus multiprocessors , 1996, Proceedings of EUROMICRO 96. 22nd Euromicro Conference. Beyond 2000: Hardware and Software Design Strategies.

[31]  Per Stenström,et al.  An Adaptive Cache Coherence Protocol Optimized For Migratory Sharing , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.

[32]  Cosimo Antonio Prete,et al.  PSCR: A Coherence Protocol for Eliminating Passive Sharing in Shared-Bus Shared-Memory Multiprocessors , 1999, IEEE Trans. Parallel Distributed Syst..

[33]  Luigi M. Ricciardi,et al.  A workload generation environment for trace-driven simulation of shared-bus multiprocessors , 1997, Proceedings of the Thirtieth Hawaii International Conference on System Sciences.

[34]  Cosimo Antonio Prete,et al.  Some Considerations About Passive Sharing in Shared-Memory Multiprocessors , 1997 .