论文信息 - CMP Cache Architecture and the OpenMP Performance

CMP Cache Architecture and the OpenMP Performance

Chip-multiprocessor (CMP) is regarded as the next generation of microprocessor architectures. For programming such machines OpenMP, a standard shared memory model, is a challenging candidate. A question arises: How to design the CMP hardware for high performance of OpenMP applications? This work explores the answer with cache architecture as a case study. Based on a simulator, we investigate how cache organization and reconfigurability influence the parallel execution of an OpenMP program. The achieved results can direct both architecture developers to determine hardware design and the programmers to generate efficient codes.

Wolfgang Karl | Jie Tao | Kim D. Hoàng

[1] Todd M. Austin,et al. SimpleScalar: An Infrastructure for Computer System Modeling , 2002, Computer.

[2] David H. Bailey,et al. The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[3] Mark Heinrich,et al. FLASH vs. (simulated) FLASH: closing the simulation loop , 2000, SIGP.

[4] Luís Fabrício Wanderley Góes,et al. Dynamically reconfigurable cache architecture using adaptive block allocation policy , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[5] Emilio Luque,et al. Evaluation of the field-programmable cache: performance and energy consumption , 2006, CF '06.

[6] Anca Mariana Molnos,et al. Static cache partitioning robustness analysis for embedded on-chip multi-processors , 2006, CF '06.

[7] Margaret Martonosi,et al. Tuning Memory Performance of Sequential and Parallel Programs , 1995, Computer.

[8] Stanley Lap,et al. Improving Cache Locality for Thread-Level Speculation Systems , 2005 .

[9] Rudolf Eigenmann,et al. Large System Performance of SPEC OMP2001 Benchmarks , 2002, ISHPC.

[10] Rohit Chandra,et al. Parallel programming in openMP , 2000 .

[11] Erik Hagersten,et al. Modeling Cache Sharing on Chip Multiprocessor Architectures , 2006, 2006 IEEE International Symposium on Workload Characterization.

[12] J. Gregory Steffan,et al. Improving cache locality for thread-level speculation , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[13] Mendel Rosenblum,et al. Using complete machine simulation to understand computer system behavior , 1998 .

[14] Peter S. Pacheco. Parallel programming with MPI , 1996 .

[15] Nicholas Nethercote,et al. Valgrind: A Program Supervision Framework , 2003, RV@CAV.

[16] Peter S. Magnusson,et al. Efficient memory simulation in SimICS , 1995, Proceedings of Simulation Symposium.

[17] Xiaoning Ding,et al. An Evaluation of OpenMP on Current and Emerging Multithreaded/Multicore Processors , 2005, IWOMP.

[18] Mahmut T. Kandemir,et al. Organizing the last line of defense before hitting the memory wall for CMPs , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).