Integrated Inference for Hardware-Software Efficiency : A Case Study in SpMV and Smart Memories

Energy efficiency is the fundamental challenge in computing. Dennard scaling has stopped, which means that Moore’s Law provides more transistors but power densities increase with integration. These power densities, combined with Amdahl’s Law, will also limit the efficiencies and tractability of future multi-core integration. Without process and parallelism to drive efficiency, we must rely on customization and integrated design. However, customization has become prohibitively expensive, primarily due to the challenge of integrated software and hardware design. Customization is facilitated by recent advances in software and hardware generators, which constrain a design space, parameterize the remaining degrees of freedom, and automatically produce functional implementations for any combination of parameter values. To address these challenges, we propose using generators more effectively by creating an integrated design framework that synthesizes key interactions between hardware and software. We demonstrate a proof of concept for sparse matrix-vector multiply (SpMV) on an embedded processor hardware base, by using statistical regression modeling. With models that capture the highly non-monotonic SpMV performance topologies, we perform integrated optimization to demonstrate a performance gain of 5.0x (Mflop/s) while reducing the energy costs per operation by 10 percent (0.9x nJ/Flop).

[1]  Mark Horowitz,et al.  Using a configurable processor generator for computer architecture prototyping , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[2]  Samuel Williams,et al.  A design methodology for domain-optimized power-efficient supercomputing , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[3]  Mark D. Hill,et al.  Amdahl's Law in the Multicore Era , 2008, Computer.

[4]  Leonid Oliker,et al.  Towards Ultra-High Resolution Models of Climate and Weather , 2008, Int. J. High Perform. Comput. Appl..

[5]  Michael F. P. O'Boyle,et al.  Microarchitectural Design Space Exploration Using an Architecture-Centric Approach , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[6]  Norman P. Jouppi,et al.  Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0 , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[7]  Mark Horowitz,et al.  Chip Multi-Processor Generator , 2007, 2007 44th ACM/IEEE Design Automation Conference.

[8]  David M. Brooks,et al.  Illustrative Design Space Studies with Microarchitectural Regression Models , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[9]  David M. Brooks,et al.  Accurate and efficient regression modeling for microarchitectural performance and power prediction , 2006, ASPLOS XII.

[10]  Sally A. McKee,et al.  Efficiently exploring architectural design spaces via predictive modeling , 2006, ASPLOS XII.

[11]  K. Bernstein,et al.  Scaling, power, and the future of CMOS , 2005, IEEE InternationalElectron Devices Meeting, 2005. IEDM Technical Digest..

[12]  Katherine Yelick,et al.  OSKI: A library of automatically tuned sparse matrix kernels , 2005 .

[13]  Franz Franchetti,et al.  SPIRAL: Code Generation for DSP Transforms , 2005, Proceedings of the IEEE.

[14]  James Demmel,et al.  Statistical Models for Empirical Search-Based Performance Tuning , 2004, Int. J. High Perform. Comput. Appl..

[15]  Keith D. Cooper,et al.  Adaptive Optimizing Compilers for the 21st Century , 2002, The Journal of Supercomputing.

[16]  Richard Vuduc,et al.  Automatic performance tuning of sparse matrix kernels , 2003 .

[17]  Thorsten Grotker,et al.  System Design with SystemC , 2002 .

[18]  Albert Wang,et al.  Hardware/software instruction set configurability for system-on-chip processors , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[19]  Ricardo E. Gonzalez,et al.  Xtensa: A Configurable and Extensible Processor , 2000, IEEE Micro.

[20]  R.H. Dennard,et al.  Design Of Ion-implanted MOSFET's with Very Small Physical Dimensions , 1974, Proceedings of the IEEE.

[21]  James Demmel,et al.  Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology , 1997, ICS '97.

[22]  Paolo Faraboschi,et al.  Custom-fit processors: letting applications define architectures , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.