The Macro-DSE for HPC Processing Unit: The Physical Constraints Perspective

Because of the popularity of big data and cloud computing, the evolution of microarchitecture has to concentrated on raw computing ability, throughput, low power and cost at the same time. Due to the huge Non-recurring engineering costs, computer architects and processor designers rely on the simulation tools and models to optimize the main processing unit. Design space exploration (DSE) methodology is responsible to filter all the possible choices. However, thousands of parameters for current multi-core processor make it too expensive to complete the exhausting search. The future high performance computing (HPC) no longer insist on peak double precision performance (DFP) only, but also on high throughput and light-weight. Depending on the various details from the number of cores to the individual pipeline buffer size, we can divide the DSE problem into macro and micro level.

[1]  William J. Dally,et al.  The GPU Computing Era , 2010, IEEE Micro.

[2]  David Blaauw,et al.  Centip3De: a many-core prototype exploring 3D integration and near-threshold computing , 2013, CACM.

[3]  Jung Ho Ahn,et al.  The McPAT Framework for Multicore and Manycore Architectures: Simultaneously Modeling Power, Area, and Timing , 2013, TACO.

[4]  Gu-Yeon Wei,et al.  Quantifying sources of error in McPAT and potential impacts on architectural studies , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[5]  Tor M. Aamodt,et al.  Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[6]  Ankur Srivastava,et al.  Unlocking the true potential of 3D CPUs with micro-fluidic cooling , 2014, 2014 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED).

[7]  Daniel A. Brokenshire,et al.  Introduction to the Cell Broadband Engine Architecture , 2007, IBM J. Res. Dev..

[8]  Mitsumasa Koyanagi,et al.  Heterogeneous 3D integration — Technology enabler toward future super-chip , 2013, 2013 IEEE International Electron Devices Meeting.

[9]  Jaewon Lee,et al.  RpStacks: Fast and Accurate Processor Design Space Exploration Using Representative Stall-Event Stacks , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[10]  Phillip B. Gibbons Big data: Scale down, scale up, scale out , 2015, IPDPS.

[11]  Denis Foley,et al.  A Low-Power Integrated x86-64 and Graphics Processor for Mobile Computing Devices , 2012, IEEE J. Solid State Circuits.

[12]  Chris Zhang,et al.  SeaMicro SM10000-64 server: Building datacenter servers using cell phone chips , 2011, 2011 IEEE Hot Chips 23 Symposium (HCS).

[13]  Ming Yang,et al.  Sonic Millip3De: A massively parallel 3D-stacked accelerator for 3D ultrasound , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[14]  Yuxing Tang,et al.  A Scalable and Fast Microprocessor Design Space Exploration Methodology , 2015, 2015 IEEE 9th International Symposium on Embedded Multicore/Many-core Systems-on-Chip.

[15]  Balaram Sinharoy,et al.  POWER4 system microarchitecture , 2002, IBM J. Res. Dev..

[16]  Franz Franchetti,et al.  Data reorganization in memory using 3D-stacked DRAM , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[17]  Lieven Eeckhout,et al.  Chip Multiprocessor Design Space Exploration through Statistical Simulation , 2009, IEEE Transactions on Computers.

[18]  Karthikeyan Sankaralingam,et al.  ISA Wars , 2015, ACM Trans. Comput. Syst..

[19]  Mateo Valero,et al.  Supercomputing with commodity CPUs: Are mobile SoCs ready for HPC? , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[20]  Michael F. P. O'Boyle,et al.  Microarchitectural Design Space Exploration Using an Architecture-Centric Approach , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[21]  Pradeep Dubey,et al.  Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU , 2010, ISCA.

[22]  Nam Sung Kim,et al.  GPUWattch: enabling energy optimizations in GPGPUs , 2013, ISCA.