Scaling the Power Wall: A Path to Exascale
暂无分享,去创建一个
William J. Dally | Justin Luitjens | David W. Nellans | Peng Wang | Oreste Villa | Daniel R. Johnson | Stephen W. Keckler | Paulius Micikevicius | Mike O'Connor | Evgeny Bolotin | Nikolai Sakharnykh | Anthony Scudiero | W. Dally | S. Keckler | Mike O'Connor | P. Micikevicius | E. Bolotin | Peng Wang | J. Luitjens | Nikolai Sakharnykh | Oreste Villa | Anthony Scudiero | D. Nellans | Evgeny Bolotin
[1] Xi Chen,et al. A 0.54 pJ/b 20 Gb/s Ground-Referenced Single-Ended Short-Reach Serial Link in 28 nm CMOS for Advanced Packaging Applications , 2013, IEEE Journal of Solid-State Circuits.
[2] A. Burg,et al. Towards generic low-power area-efficient standard cell based memory architectures , 2010, 2010 53rd IEEE International Midwest Symposium on Circuits and Systems.
[3] John Shalf,et al. Software Design Space Exploration for Exascale Combustion Co-design , 2013, ISC.
[4] Alan B. Williams,et al. Poster: mini-applications: vehicles for co-design , 2011, SC '11 Companion.
[5] Sandia Report,et al. Improving Performance via Mini-applications , 2009 .
[6] Erik Lindholm,et al. NVIDIA Tesla: A Unified Graphics and Computing Architecture , 2008, IEEE Micro.
[7] Song Huang,et al. On the energy efficiency of graphics processing units for scientific computing , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[8] William J. Dally,et al. Flattened butterfly: a cost-efficient topology for high-radix networks , 2007, ISCA '07.
[9] M. Horowitz,et al. Efficient on-chip global interconnects , 2003, 2003 Symposium on VLSI Circuits. Digest of Technical Papers (IEEE Cat. No.03CH37408).
[10] Javier Zalamea,et al. Two-level hierarchical register file organization for VLIW processors , 2000, MICRO 33.
[11] Ibm Blue,et al. Overview of the IBM Blue Gene/P Project , 2008, IBM J. Res. Dev..
[12] Krste Asanovic,et al. Convergence and scalarization for data-parallel architectures , 2013, Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[13] R. Baker,et al. An Sn algorithm for the massively parallel CM-200 computer , 1998 .
[14] Anantha Chandrakasan,et al. Application-Specific SRAM Design Using Output Prediction to Reduce Bit-Line Switching Activity and Statistically Gated Sense Amplifiers for Up to 1.9$\times$ Lower Energy/Access , 2013, IEEE Journal of Solid-State Circuits.
[15] Margaret H. Wright,et al. The opportunities and challenges of exascale computing , 2010 .
[16] Martin Schulz,et al. Exploring Traditional and Emerging Parallel Programming Models Using a Proxy Application , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.
[17] Zhiyu Zeng,et al. Tradeoff analysis and optimization of power delivery networks with on-chip voltage regulation , 2010, Design Automation Conference.
[18] John R. Rice,et al. //ELLPACK: a numerical simulation programming environment for parallel MIMD machines , 1990, ICS '90.
[19] William J. Dally,et al. GPUs and the Future of Parallel Computing , 2011, IEEE Micro.
[20] Andrew A. Chien,et al. Exascale workload characterization and architecture implications , 2013, 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[21] Andrew Siegel,et al. XSBENCH - THE DEVELOPMENT AND VERIFICATION OF A PERFORMANCE ABSTRACTION FOR MONTE CARLO REACTOR ANALYSIS , 2014 .
[22] Benton H. Calhoun,et al. A reverse write assist circuit for SRAM dynamic write VMIN tracking using canary SRAMs , 2014, Fifteenth International Symposium on Quality Electronic Design.
[23] William J. Dally,et al. Unifying Primary Cache, Scratch, and Register File Memories in a Throughput Processor , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.
[24] W. Dally,et al. Route packets, not wires: on-chip interconnection networks , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).
[25] Borivoje Nikolic,et al. SRAM Assist Techniques for Operation in a Wide Voltage Range in 28-nm CMOS , 2012, IEEE Transactions on Circuits and Systems II: Express Briefs.
[26] William J. Dally,et al. A compile-time managed multi-level register file hierarchy , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[27] Karthikeyan Sankaralingam,et al. Dark Silicon and the End of Multicore Scaling , 2012, IEEE Micro.
[28] Augustus K. Uht,et al. Uniprocessor performance enhancement through adaptive clock frequency control , 2005, IEEE Transactions on Computers.
[29] Benoît Meister,et al. Runnemede: An architecture for Ubiquitous High-Performance Computing , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).
[30] William J. Dally,et al. Technology-Driven, Highly-Scalable Dragonfly Topology , 2008, 2008 International Symposium on Computer Architecture.