论文信息 - Gables: A Roofline Model for Mobile SoCs

Gables: A Roofline Model for Mobile SoCs

Over a billion mobile consumer system-on-chip (SoC) chipsets ship each year. Of these, the mobile consumer market undoubtedly involving smartphones has a significant market share. Most modern smartphones comprise of advanced SoC architectures that are made up of multiple cores, GPS, and many different programmable and fixed-function accelerators connected via a complex hierarchy of interconnects with the goal of running a dozen or more critical software usecases under strict power, thermal and energy constraints. The steadily growing complexity of a modern SoC challenges hardware computer architects on how best to do early stage ideation. Late SoC design typically relies on detailed full-system simulation once the hardware is specified and accelerator software is written or ported. However, early-stage SoC design must often select accelerators before a single line of software is written. To help frame SoC thinking and guide early stage mobile SoC design, in this paper we contribute the Gables model that refines and retargets the Roofline model—designed originally for the performance and bandwidth limits of a multicore chip—to model each accelerator on a SoC, to apportion work concurrently among different accelerators (justified by our usecase analysis), and calculate a SoC performance upper bound. We evaluate the Gables model with an existing SoC and develop several extensions that allow Gables to inform early stage mobile SoC design.

Mark D. Hill | Vijay Janapa Reddi | V. Reddi | M. Hill

[1] Uri C. Weiser,et al. MultiAmdahl: How Should I Divide My Heterogenous Chip? , 2012, IEEE Computer Architecture Letters.

[2] Norman P. Jouppi,et al. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[3] Willie Anderson,et al. Hexagon DSP: An Architecture Optimized for Mobile Multimedia and Communications , 2014, IEEE Micro.

[4] Samuel Williams,et al. Roofline Model Toolkit: A Practical Tool for Architectural and Program Analysis , 2014, PMBS@SC.

[5] Vijay Janapa Reddi,et al. Mobile CPU's rise to power: Quantifying the impact of generational mobile CPU design trends on performance, energy, and user satisfaction , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[6] Avi Mendelson,et al. Many-Core vs. Many-Thread Machines: Stay Away From the Valley , 2009, IEEE Computer Architecture Letters.

[7] Mark D. Hill,et al. Amdahl's Law in the Multicore Era , 2008 .

[8] Edward D. Lazowska,et al. Quantitative system performance - computer system analysis using queueing network models , 1984, Int. CMG Conference.

[9] Christina Delimitrou,et al. Amdahl's law for tail latency , 2018, Commun. ACM.

[10] G. Amdhal,et al. Validity of the single processor approach to achieving large scale computing capabilities , 1967, AFIPS '67 (Spring).

[11] L. Dagum,et al. OpenMP: an industry standard API for shared-memory programming , 1998 .

[12] Vijay Janapa Reddi,et al. Two Billion Devices and Counting , 2018, IEEE Micro.

[13] John L. Gustafson,et al. Reevaluating Amdahl's law , 1988, CACM.

[14] Carl Staelin,et al. lmbench: Portable Tools for Performance Analysis , 1996, USENIX Annual Technical Conference.

[15] Pradeep Dubey,et al. Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU , 2010, ISCA.

[16] David A. Wood,et al. LogCA: A high-level performance model for hardware accelerators , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[17] Samuel Williams,et al. Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.

[18] Alan Jay Smith,et al. Evaluating Associativity in CPU Caches , 1989, IEEE Trans. Computers.

[19] Gu-Yeon Wei,et al. The Aladdin Approach to Accelerator Design and Modeling , 2015, IEEE Micro.

[20] Mahmut T. Kandemir,et al. Anatomy of GPU Memory System for Multi-Application Execution , 2015, MEMSYS.

[21] Anna Gerber,et al. Opengl Programming Guide The Official Guide To Learning Opengl Versions 3 0 And 3 1 , 2016 .