Evaluating optimizations for multiprocessors e-commerce server running TPC-W workload

The performance of an electronic commerce server, i.e. a system running electronic commerce applications is evaluated in the case of shared-bus multiprocessor architecture. In particular, we focused on the memory subsystem design and the analysis of coherence related overhead when the running software is set up as specified in the TPC-W benchmark. Our aim is to individuate main factors that limit performance in such a system, and the main optimization that can be done to speed up the execution of e-commerce workload on SMP architecture. Our results show that: (i) we need an accurate redesign of kernel data structure for large cache size; (ii) cache affinity is useful in reducing cold and replacement miss, but it is not effective in every load condition; (iii) passive sharing, i.e. the sharing induced by process migration, is a cause of performance degradation. A Write-Update protocol that correctly treats passive sharing (namely PSCR) permits two beneficial effects: increases performance in every situation and increases system scalability (up to 20 processor are permitted in our configuration).

[1]  Dominic Sweetman,et al.  See MIPS run , 1999 .

[2]  Veljko Milutinovic,et al.  The Cache Coherence Problem in Shared-Memory Multiprocessors: Software Solutions , 1996 .

[3]  Michael J. Flynn,et al.  Computer Architecture: Pipelined and Parallel Processor Design , 1995 .

[4]  John L. Hennessy,et al.  The accuracy of trace-driven simulations of multiprocessors , 1993, SIGMETRICS '93.

[5]  Mats Brorsson,et al.  An adaptive cache coherence protocol optimized for migratory sharing , 1993, ISCA '93.

[6]  Anoop Gupta,et al.  Memory-reference characteristics of multiprocessor applications under MACH , 1988, SIGMETRICS 1988.

[7]  Veljko M. Milutinovic,et al.  Hardware approaches to cache coherence in shared-memory multiprocessors, Part 1 , 1994, IEEE Micro.

[8]  Josep Torrellas,et al.  Optimizing the Instruction Cache Performance of the Operating System , 1998, IEEE Trans. Computers.

[9]  W. Kent Fuchs,et al.  Address tracing for parallel machines , 1991, Computer.

[10]  Ted G. Lewis The Legacy Maturity Model , 1998, Computer.

[11]  David R. Kaeli,et al.  Analysis of Temporal-Based Program Behavior for Improved Instruction Cache Performance , 1999, IEEE Trans. Computers.

[12]  Veljko M. Milutinovic,et al.  The word-invalidate cache coherence protocol , 1996, Microprocess. Microsystems.

[13]  Luigi M. Ricciardi,et al.  A Trace-Driven Simulator for Performance Evaluation of Cache-Based Multiprocessor Systems , 1995, IEEE Trans. Parallel Distributed Syst..

[14]  Luigi M. Ricciardi,et al.  A workload generation environment for trace-driven simulation of shared-bus multiprocessors , 1997, Proceedings of the Thirtieth Hawaii International Conference on System Sciences.

[15]  Christopher J. Matheus,et al.  Reinventing GTE with Information Technology , 1999, Computer.

[16]  Josep Torrellas,et al.  The memory performance of DSS commercial workloads in shared-memory multiprocessors , 1997, Proceedings Third International Symposium on High-Performance Computer Architecture.

[18]  Trevor N. Mudge,et al.  Trace-driven memory simulation: a survey , 1997, CSUR.

[19]  Anoop Gupta,et al.  Memory-reference characteristics of multiprocessor applications under MACH , 1988, SIGMETRICS '88.

[20]  Cosimo Antonio Prete,et al.  A new solution of coherence protocol for tightly coupled multiprocessor systems , 1990, Microprocessing and Microprogramming.

[21]  Randall L. Hyde,et al.  An Analysis of Degenerate Sharing and False Coherence , 1996, J. Parallel Distributed Comput..

[22]  Josep Torrellas,et al.  Evaluating the Performance of Cache-Affinity Scheduling in Shared-Memory Multiprocessors , 1995, J. Parallel Distributed Comput..

[23]  S. Lorenzini,et al.  A fast procedure placement algorithm for optimal cache use , 1998, MELECON '98. 9th Mediterranean Electrotechnical Conference. Proceedings (Cat. No.98CH36056).

[24]  Jean-Marc Andreoli,et al.  XPECT: A Framework for Electronic Commerce , 1997, IEEE Internet Comput..

[25]  Cosimo Antonio Prete,et al.  RST cache memory design for a highly coupled multiprocessor system , 1991, IEEE Micro.

[26]  Luigi M. Ricciardi,et al.  Trace Factory: generating workloads for trace-driven simulation of shared-bus multiprocessors , 1997, IEEE Concurrency.

[27]  Josep Torrellas,et al.  False Sharing ans Spatial Locality in Multiprocessor Caches , 1994, IEEE Trans. Computers.

[28]  Roland Wismüller,et al.  Parallel and distributed computing , 2001, Softw. Focus.

[29]  T. Lewis The Legacy Maturity Level [Binary Critic] , 1998, Computer.

[30]  Alan Jay Smith,et al.  A class of compatible cache consistency protocols and their support by the IEEE futurebus , 1986, ISCA '86.

[31]  Robert J. Fowler,et al.  Adaptive cache coherency for detecting migratory shared data , 1993, ISCA '93.

[32]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[33]  David W. Walker Free-market computing and the global economic infrastructure , 1996, IEEE Parallel Distributed Technol. Syst. Appl..

[34]  Veljko M. Milutinovic,et al.  Hardware approaches to cache coherence in shared-memory multiprocessors. 2 , 1994, IEEE Micro.

[35]  Anoop Gupta,et al.  Performance evaluation of memory consistency models for shared-memory multiprocessors , 1991, ASPLOS IV.

[36]  John Vert,et al.  Windows NT clusters for availability and scalabilty , 1997, Proceedings IEEE COMPCON 97. Digest of Papers.

[37]  Cosimo Antonio Prete,et al.  PSCR: A Coherence Protocol for Eliminating Passive Sharing in Shared-Bus Shared-Memory Multiprocessors , 1999, IEEE Trans. Parallel Distributed Syst..

[38]  Susan J. Eggers,et al.  Reducing false sharing on shared memory multiprocessors through compile time data transformations , 1995, PPOPP '95.

[39]  Cosimo Antonio Prete,et al.  Some Considerations About Passive Sharing in Shared-Memory Multiprocessors , 1997 .

[40]  John Edwards The Changing Face of Freeware , 1998, Computer.

[41]  Dale Poulter 3-Tier Client/Server at Work , 1998 .