Design and early evaluation of a 3-D die stacked chip multi-vector processor

Modern vector processors have significant advantages over commodity-based scalar processors for memory-intensive scientific applications. However, vector processors still keep single core architecture, though chip multiprocessors (CMPs) have become the mainstream in recent processor architectures. To realize more efficient and powerful computations on a vector processor, this paper proposes a 3-D stacked chip multi-vector processor (CMVP) by combining a chip multi-vector processor architecture and the coarse-grain die stacking technology. The 3-D stacked CMVP consists of I/O layers, core layers and the vector cache layers. The I/O layer significantly improves off-chip memory bandwidth, and the vector core layer enables to install many vector cores on a die. The vector cache layer increases the capacity of on-chip memory and a high memory bandwidth to achieve the performance improvement and energy reduction by deceasing the number of off-chip memory accesses. The results of performance evaluation using real scientific and engineering applications show the potential of the 3-D stacked CMVP. Moreover, this paper clarifies that introducing the vector cache is more energy-effective than increasing the off-chip memory bandwidth to achieve the same sustained performance on the 3-D stacked CMVP.

[1]  Hiroaki Kobayashi,et al.  3D on-chip memory for the vector architecture , 2009, 2009 IEEE International Conference on 3D System Integration.

[2]  M. Koyanagi,et al.  Three-Dimensional Integration Technology Based on Wafer Bonding With Vertical Buried Interconnections , 2006, IEEE Transactions on Electron Devices.

[3]  Yuan Xie,et al.  System-level cost analysis and design exploration for three-dimensional integrated circuits (3D ICs) , 2009, 2009 Asia and South Pacific Design Automation Conference.

[4]  Lei Jiang,et al.  Die Stacking (3D) Microarchitecture , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[5]  Diederik Verkest,et al.  3-D Technology Assessment: Path-Finding the Technology/Design Sweet-Spot , 2009, Proceedings of the IEEE.

[6]  Jian Xu,et al.  Demystifying 3D ICs: the pros and cons of going vertical , 2005, IEEE Design & Test of Computers.

[7]  Robert Patti,et al.  Techniques for Producing 3D ICs with High-Density Interconnect , 2004 .

[8]  Gabriel H. Loh,et al.  Thermal Herding: Microarchitecture Techniques for Controlling Hotspots in High-Performance 3D-Integrated Processors , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[9]  Hideki Yanaoka,et al.  NUMERICAL SIMULATION OF THREE-DIMENSIONAL SEPARATED FLOW AND HEAT TRANSFER AROUND STAGGERED SURFACE-MOUNTED RECTANGULAR BLOCKS IN A CHANNEL , 2005 .

[10]  Vivek Sarkar,et al.  Software challenges in extreme scale systems , 2009 .

[11]  Inasaka Jun,et al.  Techniques for power supply noise management in the SX supercomputers , 2008 .

[12]  Gabriel H. Loh,et al.  3D-Stacked Memory Architectures for Multi-core Processors , 2008, 2008 International Symposium on Computer Architecture.

[13]  Hiroaki Kobayashi,et al.  An on-chip cache design for vector processors , 2007, MEDEA '07.

[14]  Akira Hasegawa,et al.  The key frictional parameters controlling spatial variations in the speed of postseismic-slip propagation on a subduction plate boundary , 2007 .

[15]  Tong Zhang,et al.  Architecture design exploration of three-dimensional (3D) integrated DRAM , 2009, 2009 10th International Symposium on Quality Electronic Design.

[16]  Leonid Oliker,et al.  Scientific Computations on Modern Parallel Vector Systems , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[17]  Takashi Yanagawa,et al.  The NEC SX-8 Vector Supercomputer System , 2006 .

[18]  Yuan Xie,et al.  Processor Design in 3D Die-Stacking Technologies , 2007, IEEE Micro.

[19]  Shyamkumar Thoziyoor,et al.  CACTI 5 . 1 , 2008 .

[20]  Krisztián Flautner,et al.  PicoServer: Using 3D stacking technology to build energy efficient servers , 2008, JETC.

[21]  Kunio SAWAYA,et al.  STUDY OF HIGH GAIN AND BROADBAND ANTIPODAL FERMI ANTENNA WITH CORRUGATION , 2004 .

[22]  Nisha Checka,et al.  Technology, performance, and computer-aided design of three-dimensional integrated circuits , 2004, ISPD '04.

[23]  Narayanan Vijaykrishnan,et al.  Design Space Exploration for 3-D Cache , 2008, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[24]  Saurabh Dighe,et al.  An 80-Tile 1.28TFLOPS Network-on-Chip in 65nm CMOS , 2007, 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[25]  Hiroaki Kobayashi,et al.  A shared cache for a chip multi vector processor , 2008, MEDEA '08.

[26]  Motoyuki Sato,et al.  FDTD Simulation on Array Antenna SAR-GPR for Land Mine Detection , 2005 .

[27]  Mitsumasa Koyanagi,et al.  High-Density Through Silicon Vias for 3-D LSIs , 2009, Proceedings of the IEEE.

[28]  Yiran Chen,et al.  Circuit and microarchitecture evaluation of 3D stacking magnetic RAM (MRAM) as a universal memory replacement , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[29]  Xiaoxia Wu,et al.  Hybrid cache architecture with disparate memory technologies , 2009, ISCA '09.