论文信息 - Dark silicon and the end of multicore scaling

Dark silicon and the end of multicore scaling

Since 2005, processor designers have increased core counts to exploit Moore's Law scaling, rather than focusing on single-core performance. The failure of Dennard scaling, to which the shift to multicore parts is partially a response, may soon limit multicore scaling just as single-core scaling has been curtailed. This paper models multicore scaling limits by combining device scaling, single-core scaling, and multicore scaling to measure the speedup potential for a set of parallel workloads for the next five technology generations. For device scaling, we use both the ITRS projections and a set of more conservative device scaling parameters. To model single-core scaling, we combine measurements from over 150 processors to derive Pareto-optimal frontiers for area/performance and power/performance. Finally, to model multicore scaling, we build a detailed performance model of upper-bound performance and lower-bound core power. The multicore designs we study include single-threaded CPU-like and massively threaded GPU-like multicore chip organizations with symmetric, asymmetric, dynamic, and composed topologies. The study shows that regardless of chip organization and topology, multicore scaling is power limited to a degree not widely appreciated by the computing community. Even at 22 nm (just one year from now), 21% of a fixed-size chip must be powered off, and at 8 nm, this number grows to more than 50%. Through 2024, only 7.9× average speedup is possible across commonly used parallel workloads, leaving a nearly 24-fold gap from a target of doubled performance per generation.

[1] Mark Horowitz,et al. Energy-performance tradeoffs in processor architecture and circuit design: a marginal cost analysis , 2010, ISCA.

[2] Mark D. Hill,et al. Amdahl's Law in the Multicore Era , 2008, Computer.

[3] Kai Li,et al. The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[4] Pradeep Dubey,et al. Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU , 2010, ISCA.

[5] Shekhar Borkar. The Exascale challenge , 2010, Proceedings of 2010 International Symposium on VLSI Design, Automation and Test.

[6] Steven Swanson,et al. Conservation cores: reducing the energy of mature computations , 2010, ASPLOS XV.

[7] G. Amdhal,et al. Validity of the single processor approach to achieving large scale computing capabilities , 1967, AFIPS '67 (Spring).

[8] G.E. Moore,et al. Cramming More Components Onto Integrated Circuits , 1998, Proceedings of the IEEE.

[9] Coniferous softwood. GENERAL TERMS , 2003 .

[10] Henry Wong,et al. Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[11] Engin Ipek,et al. Core fusion: accommodating software diversity in chip multiprocessors , 2007, ISCA '07.

[12] Xi Yang,et al. Looking back on the language and hardware revolutions: measured power, performance, and scaling , 2011, ASPLOS XVI.

[13] Avi Mendelson,et al. Many-Core vs. Many-Thread Machines: Stay Away From the Valley , 2009, IEEE Computer Architecture Letters.

[14] Fred J. Pollack. New microarchitecture challenges in the coming generations of CMOS process technologies (keynote address)(abstract only) , 1999, MICRO.

[15] Jeong-Gun Lee,et al. An Asymptotic Performance/Energy Analysis and Optimization of Multi-core Architectures , 2009, ICDCN.

[16] S. Winkel. Optimal versus Heuristic Global Code Scheduling , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[17] Rami G. Melhem,et al. Corollaries to Amdahl's Law for Energy , 2008, IEEE Computer Architecture Letters.

[18] Hsien-Hsin S. Lee,et al. Extending Amdahl's Law for Energy-Efficient Computing in the Many-Core Era , 2008, Computer.

[19] Gabriel H. Loh,et al. The Cost of Uncore in Throughput-Oriented Many-Core Processors , 2008 .

[20] Onur Mutlu,et al. Accelerating critical section execution with asymmetric multi-core architectures , 2009, ASPLOS.

[21] Babak Falsafi,et al. Power Scaling: the Ultimate Obstacle to 1K-Core Chips , 2010 .

[22] R.H. Dennard,et al. Design Of Ion-implanted MOSFET's with Very Small Physical Dimensions , 1974, Proceedings of the IEEE.

[23] Gurindar S. Sohi,et al. Over-provisioned multicore systems , 2008 .

[24] Mark Horowitz,et al. Scaling, Power and the Future of CMOS , 2007, 20th International Conference on VLSI Design held jointly with 6th International Conference on Embedded Systems (VLSID'07).

[25] K. Bernstein,et al. Scaling, power, and the future of CMOS , 2005, IEEE InternationalElectron Devices Meeting, 2005. IEDM Technical Digest..

[26] James C. Hoe,et al. Single-Chip Heterogeneous Computing: Does the Future Include Custom Logic, FPGAs, and GPGPUs? , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[27] Sally A. McKee,et al. Understanding PARSEC performance on contemporary CMPs , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).