Performance analysis of electronic commerce multiprocessor server

The performance of an electronic commerce server, i.e. a system running electronic commerce applications, is evaluated in the case of a shared-bus multiprocessor architecture. In particular, we focus on the memory subsystem design. We have analyzed the common case of a system using the MESI coherence protocol, for maintaining coherency among the processor private caches. We have evaluated the miss ratio and the bus traffic of such a system by varying cache size, number of ways, scheduling policy and number of processors, highlighting the relations with different types of data sharing generated by the application or the kernel. We found that passive sharing and false sharing are the major sources of coherence overhead in the case of relatively large caches (over 1M-byte size). False sharing is mainly due to kernel data, and can be eliminated by using appropriate data structure design techniques. A scheduling technique, like cache-affinity can reduce passive sharing but it is not effective in every load condition. Thus, a special coherence protocol could be a better solution to completely eliminate passive sharing overhead and boost performance.

[1]  Veljko M. Milutinovic,et al.  Hardware approaches to cache coherence in shared-memory multiprocessors. 2 , 1994, IEEE Micro.

[2]  Dale Poulter 3-Tier Client/Server at Work , 1998 .

[3]  Randall L. Hyde,et al.  An Analysis of Degenerate Sharing and False Coherence , 1996, J. Parallel Distributed Comput..

[4]  Susan J. Eggers,et al.  Reducing false sharing on shared memory multiprocessors through compile time data transformations , 1995, PPOPP '95.

[5]  Trevor N. Mudge,et al.  Trace-driven memory simulation: a survey , 1997, CSUR.

[6]  Luigi M. Ricciardi,et al.  A workload generation environment for trace-driven simulation of shared-bus multiprocessors , 1997, Proceedings of the Thirtieth Hawaii International Conference on System Sciences.

[7]  John Edwards The Changing Face of Freeware , 1998, Computer.

[8]  Jean-Marc Andreoli,et al.  XPECT: A Framework for Electronic Commerce , 1997, IEEE Internet Comput..

[9]  Cosimo Antonio Prete,et al.  RST cache memory design for a highly coupled multiprocessor system , 1991, IEEE Micro.

[10]  Mark S. Squillante,et al.  Using Processor-Cache Affinity Information in Shared-Memory Multiprocessor Scheduling , 1993, IEEE Trans. Parallel Distributed Syst..

[11]  Luigi M. Ricciardi,et al.  Trace Factory: generating workloads for trace-driven simulation of shared-bus multiprocessors , 1997, IEEE Concurrency.

[12]  Veljko Milutinovic,et al.  The Cache Coherence Problem in Shared-Memory Multiprocessors: Software Solutions , 1996 .

[13]  Luigi M. Ricciardi,et al.  A hybrid approach to trace generation for performance evaluation of shared-bus multiprocessors , 1996, Proceedings of EUROMICRO 96. 22nd Euromicro Conference. Beyond 2000: Hardware and Software Design Strategies.

[14]  Per Stenström,et al.  An Adaptive Cache Coherence Protocol Optimized For Migratory Sharing , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.

[15]  Cosimo Antonio Prete,et al.  PSCR: A Coherence Protocol for Eliminating Passive Sharing in Shared-Bus Shared-Memory Multiprocessors , 1999, IEEE Trans. Parallel Distributed Syst..

[16]  Mats Brorsson,et al.  An adaptive cache coherence protocol optimized for migratory sharing , 1993, ISCA '93.

[17]  Anoop Gupta,et al.  Memory-reference characteristics of multiprocessor applications under MACH , 1988, SIGMETRICS 1988.

[18]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[19]  David W. Walker Free-market computing and the global economic infrastructure , 1996, IEEE Parallel Distributed Technol. Syst. Appl..

[20]  John L. Hennessy,et al.  The accuracy of trace-driven simulations of multiprocessors , 1993, SIGMETRICS '93.

[21]  Tom Shanley Pentium Pro and Pentium II system architecture , 1998 .

[22]  Anoop Gupta,et al.  Memory-reference characteristics of multiprocessor applications under MACH , 1988, SIGMETRICS '88.

[23]  Anoop Gupta,et al.  Performance evaluation of memory consistency models for shared-memory multiprocessors , 1991, ASPLOS IV.

[24]  W. Kent Fuchs,et al.  Address tracing for parallel machines , 1991, Computer.

[25]  Josep Torrellas,et al.  False Sharing ans Spatial Locality in Multiprocessor Caches , 1994, IEEE Trans. Computers.

[26]  Christopher J. Matheus,et al.  Reinventing GTE with Information Technology , 1999, Computer.

[27]  David A. Patterson,et al.  Computer Architecture - A Quantitative Approach, 5th Edition , 1996 .

[28]  Cosimo Antonio Prete,et al.  Some Considerations About Passive Sharing in Shared-Memory Multiprocessors , 1997 .

[29]  S. Lorenzini,et al.  A fast procedure placement algorithm for optimal cache use , 1998, MELECON '98. 9th Mediterranean Electrotechnical Conference. Proceedings (Cat. No.98CH36056).

[30]  T. Lewis The Legacy Maturity Level [Binary Critic] , 1998, Computer.

[31]  Alan Jay Smith,et al.  A class of compatible cache consistency protocols and their support by the IEEE futurebus , 1986, ISCA '86.

[32]  Luiz André Barroso,et al.  Memory system characterization of commercial workloads , 1998, ISCA.

[33]  Cosimo Antonio Prete,et al.  A new solution of coherence protocol for tightly coupled multiprocessor systems , 1990, Microprocessing and Microprogramming.

[34]  Veljko M. Milutinovic,et al.  Hardware approaches to cache coherence in shared-memory multiprocessors, Part 1 , 1994, IEEE Micro.

[35]  J. Nunamaker,et al.  Proceedings of the 32nd Hawaii International Conference on System Sciences , 1999 .

[36]  Erik Hagersten,et al.  Trends in Shared Memory Multiprocessing , 1997, Computer.

[37]  Ted G. Lewis The Legacy Maturity Model , 1998, Computer.

[38]  Veljko M. Milutinovic,et al.  The word-invalidate cache coherence protocol , 1996, Microprocess. Microsystems.

[39]  Luigi M. Ricciardi,et al.  A Trace-Driven Simulator for Performance Evaluation of Cache-Based Multiprocessor Systems , 1995, IEEE Trans. Parallel Distributed Syst..