Memory System Design for a Multi-core Processor

Multi-core processor has become hot research area recently. Cache results in high cost to maintain consistency between different data copies in multi-core processor especially in many-core processor. A hybrid memory architecture is proposed for the specific multi-core processor which uses cache for instruction while local storage for data. This paper focuses on the design and optimization of the proposed memory architecture. L1 instruction cache, local data storage, DMA engine, L2 cache and MMU is designed and optimized. L2 cache replacement strategy is studied to reduce the total miss cost.

[1]  Wayne H. Wolf,et al.  MediaBench II video: Expediting the next generation of video systems research , 2009, Microprocess. Microsystems.

[2]  Tomás Lang,et al.  Reducing TLB power requirements , 1997, Proceedings of 1997 International Symposium on Low Power Electronics and Design.

[3]  Guang R. Gao,et al.  TiNy threads: a thread virtual machine for the Cyclops64 cellular architecture , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[4]  Sethuraman Panchanathan,et al.  Embedded Processors for Multimedia and Communications II , 2005 .

[5]  Gerard O'Regan Texas Instruments , 1964, Nature.

[6]  James R. Goodman,et al.  Instruction Cache Replacement Policies and Organizations , 1985, IEEE Transactions on Computers.

[7]  Jaafar Alghazo,et al.  SF-LRU cache replacement algorithm , 2004, Records of the 2004 International Workshop on Memory Technology, Design and Testing, 2004..

[8]  Mahmut T. Kandemir,et al.  Generating physical addresses directly for saving instruction TLB energy , 2002, MICRO.

[9]  Michel Dubois,et al.  Optimal replacements in caches with two miss costs , 1999, SPAA '99.

[10]  H. Peter Hofstee,et al.  Power efficient processor architecture and the cell processor , 2005, 11th International Symposium on High-Performance Computer Architecture.

[11]  Laszlo A. Belady,et al.  A Study of Replacement Algorithms for Virtual-Storage Computer , 1966, IBM Syst. J..

[12]  J. P. Grossman A Systolic Array for Implementing LRU Replacement , 2002 .

[13]  Michel Dubois,et al.  Cost-sensitive cache replacement algorithms , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..

[14]  Michel Dubois,et al.  Cache replacement algorithms with nonuniform miss costs , 2006, IEEE Transactions on Computers.

[15]  Anand Sivasubramaniam,et al.  Generating physical addresses directly for saving instruction TLB energy , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..

[16]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[17]  Henk Corporaal,et al.  MOVE: a framework for high-performance processor design , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[18]  David A. Patterson,et al.  Computer Architecture - A Quantitative Approach, 5th Edition , 1996 .

[19]  Kunle Olukotun,et al.  The Stanford Hydra CMP , 2000, IEEE Micro.

[20]  Onur Mutlu,et al.  A Case for MLP-Aware Cache Replacement , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[21]  William J. Dally,et al.  Memory hierarchy design for stream computing , 2005 .