论文信息 - Coherence maintenances to realize an efficient parallel processing for a cache memory with synchronization on a chip-multiprocessor

Coherence maintenances to realize an efficient parallel processing for a cache memory with synchronization on a chip-multiprocessor

A chip-multiprocessor is one of the promising architectures that can overcome the ILP limitation, high power consumption and high heating that current processors face. On a shared memory multiprocessor, a performance improvement relies on an efficient communication and synchronization method via shared variables. The TSVM cache combines communication and synchronization with the coherence maintenance on a chip-multiprocessor. That is, the communication and synchronization via shared variables are realized by one coherence transaction through a high-speed on chip inter-connection. The TSVM cache provides several instructions that each instruction has the individual coherence maintenance scheme. The combinations of these instructions can realize the producer-consumers synchronization, mutual exclusion and barrier synchronization with communication easily and systematically. This paper describes how those instructions construct three primitives and shows effect of these primitives using a clock cycle-accurate simulator written in VHDL. The result shows that the TSVM cache can improve a performance of 9.8 times compared with a traditional cache memory, and improve a performance of 2 times compared with a conventional cache memory with synchronization mechanism.

Akira Yamawaki | Masahiko Iwane | A. Yamawaki | M. Iwane

[1] Joonwon Lee,et al. Cache-Based Synchronization in Shared Memory Multiprocessors , 1996, J. Parallel Distributed Comput..

[2] Keshav Pingali,et al. I-structures: data structures for parallel computing , 1986, Graph Reduction.

[3] David A. Patterson,et al. Computer Architecture - A Quantitative Approach, 5th Edition , 1996 .

[4] Akira Yamawaki,et al. Organization of shared memory with synchronization for multiprocessor-on-a-chip , 2002, Ninth International Conference on Parallel and Distributed Systems, 2002. Proceedings..

[5] Akira Yamawaki,et al. Evaluation of mechanisms introduced to improve performance of TSVM cache , 2004, Parallel and Distributed Computing and Networks.

[6] Hesham H. Ali,et al. Task scheduling in parallel and distributed systems , 1994, Prentice Hall series in innovative technology.

[7] Donald Yeung,et al. The MIT Alewife Machine , 1999, Proc. IEEE.

[8] Kunle Olukotun,et al. The Stanford Hydra CMP , 2000, IEEE Micro.

[9] David Blaauw,et al. Mobile supercomputers , 2004, Computer.

[10] Mamoru Sugie,et al. Evaluation of the lock mechanism in a snooping cache , 1992, ICS '92.

[11] Hironori Kasahara,et al. Multigrain Parallel Processing on Compiler Cooperative OSCAR Chip Multiprocessor Architecture , 2003 .

[12] D. Scott Wills,et al. Architecture of the Atlas chip-multiprocessor: dynamically parallelizing irregular applications , 1999, Proceedings 1999 IEEE International Conference on Computer Design: VLSI in Computers and Processors (Cat. No.99CB37040).

[13] Hiroshi Nakashima,et al. The intelligent cache controller of a massively parallel processor JUMP-I , 1997, Proceedings Innovative Architecture for Future Generation High-Performance Processors and Systems.

[14] A. Yamawaki,et al. Easily customizable open soft processor cores , 2004, IEEE Conference on Robotics and Automation, 2004. TExCRA Technical Exhibition Based..

[15] Veljko M. Milutinovic,et al. Distributed shared memory: concepts and systems , 1997, IEEE Parallel Distributed Technol. Syst. Appl..

[16] Keshav Pingali,et al. I-structures: Data structures for parallel computing , 1986, Graph Reduction.

[17] Hesham El-Rewini,et al. Parallax: a tool for parallel program scheduling , 1993, IEEE Parallel & Distributed Technology: Systems & Applications.

[18] Larry Carter,et al. Performance and Programming Experience on the Tera MTA , 1999, PPSC.