Coherence maintenances to realize an efficient parallel processing for a cache memory with synchronization on a chip-multiprocessor

A chip-multiprocessor is one of the promising architectures that can overcome the ILP limitation, high power consumption and high heating that current processors face. On a shared memory multiprocessor, a performance improvement relies on an efficient communication and synchronization method via shared variables. The TSVM cache combines communication and synchronization with the coherence maintenance on a chip-multiprocessor. That is, the communication and synchronization via shared variables are realized by one coherence transaction through a high-speed on chip inter-connection. The TSVM cache provides several instructions that each instruction has the individual coherence maintenance scheme. The combinations of these instructions can realize the producer-consumers synchronization, mutual exclusion and barrier synchronization with communication easily and systematically. This paper describes how those instructions construct three primitives and shows effect of these primitives using a clock cycle-accurate simulator written in VHDL. The result shows that the TSVM cache can improve a performance of 9.8 times compared with a traditional cache memory, and improve a performance of 2 times compared with a conventional cache memory with synchronization mechanism.

[1]  Joonwon Lee,et al.  Cache-Based Synchronization in Shared Memory Multiprocessors , 1996, J. Parallel Distributed Comput..

[2]  Keshav Pingali,et al.  I-structures: data structures for parallel computing , 1986, Graph Reduction.

[3]  David A. Patterson,et al.  Computer Architecture - A Quantitative Approach, 5th Edition , 1996 .

[4]  Akira Yamawaki,et al.  Organization of shared memory with synchronization for multiprocessor-on-a-chip , 2002, Ninth International Conference on Parallel and Distributed Systems, 2002. Proceedings..

[5]  Akira Yamawaki,et al.  Evaluation of mechanisms introduced to improve performance of TSVM cache , 2004, Parallel and Distributed Computing and Networks.

[6]  Hesham H. Ali,et al.  Task scheduling in parallel and distributed systems , 1994, Prentice Hall series in innovative technology.

[7]  Donald Yeung,et al.  The MIT Alewife Machine , 1999, Proc. IEEE.

[8]  Kunle Olukotun,et al.  The Stanford Hydra CMP , 2000, IEEE Micro.

[9]  David Blaauw,et al.  Mobile supercomputers , 2004, Computer.

[10]  Mamoru Sugie,et al.  Evaluation of the lock mechanism in a snooping cache , 1992, ICS '92.

[11]  Hironori Kasahara,et al.  Multigrain Parallel Processing on Compiler Cooperative OSCAR Chip Multiprocessor Architecture , 2003 .

[12]  D. Scott Wills,et al.  Architecture of the Atlas chip-multiprocessor: dynamically parallelizing irregular applications , 1999, Proceedings 1999 IEEE International Conference on Computer Design: VLSI in Computers and Processors (Cat. No.99CB37040).

[13]  Hiroshi Nakashima,et al.  The intelligent cache controller of a massively parallel processor JUMP-I , 1997, Proceedings Innovative Architecture for Future Generation High-Performance Processors and Systems.

[14]  A. Yamawaki,et al.  Easily customizable open soft processor cores , 2004, IEEE Conference on Robotics and Automation, 2004. TExCRA Technical Exhibition Based..

[15]  Veljko M. Milutinovic,et al.  Distributed shared memory: concepts and systems , 1997, IEEE Parallel Distributed Technol. Syst. Appl..

[16]  Keshav Pingali,et al.  I-structures: Data structures for parallel computing , 1986, Graph Reduction.

[17]  Hesham El-Rewini,et al.  Parallax: a tool for parallel program scheduling , 1993, IEEE Parallel & Distributed Technology: Systems & Applications.

[18]  Larry Carter,et al.  Performance and Programming Experience on the Tera MTA , 1999, PPSC.