System-level power & energy estimation methodology and optimization techniques for CPU-GPU based mobile platforms

Due to the growing computational requirements of mobile applications, using a heterogeneous Multiprocessor System-on-Chip becomes an incontrovertible solution to meet the service requirements. Today, Electronic System-Level design is considered as a vital premise to explore design trade-offs for such devices in the early stage of the design flow. This paper proposes a novel system-level power/energy estimation methodology and optimization techniques for heterogeneous CPU-GPU based platforms. There are two parts involved in this methodology. First, we developed the power models by using functional parameters to set up generic power models for different parts of the platform. Second, we designed a simulation based system-level prototype using SystemC (JIT) and Cycle-Accurate simulators to accurately evaluate the activities used in the related power models. The combination of the two parts leads to a novel power estimation methodology at system-level, which gives a good trade-off between accuracy and speed. Moreover, leveraging our methodology, we introduce novel power optimization techniques such as inter-task DVFS and workload balancing at the system-level for CPU-GPU platforms. The efficiency of our proposed methodology and optimization techniques are validated through a CARMA kit, which consists of an ARM quad-core processor and a NVIDIA GPU processor (96 cores). Estimated power and energy values are compared to real board measurements. Our obtained power/energy estimation results provide less than 2.5% of error for single core processor, 4% for dual-core processor, 4% for quad-core, 4% for GPU and 6% multi-processor based systems. By using the proposed optimization techniques, we achieved significant power and energy savings of up to 45% and 70% respectively for various industrial benchmarks.

[1]  Cécile Belleudy,et al.  An inter-task real time DVFS scheme for multiprocessor embedded systems , 2010, 2010 Conference on Design and Architectures for Signal and Image Processing (DASIP).

[2]  Stijn Eyerman,et al.  Fine-grained DVFS using on-chip regulators , 2011, TACO.

[3]  Osman S. Unsal,et al.  PETS: Power and energy estimation tool at system-level , 2014, Fifteenth International Symposium on Quality Electronic Design.

[4]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[5]  Franco Fummi,et al.  SAGA: SystemC acceleration on GPU architectures , 2012, DAC Design Automation Conference 2012.

[6]  Dominique Blouin,et al.  CAT: An extensible system-level power Consumption Analysis Toolbox for Model-Driven design , 2010, Proceedings of the 8th IEEE International NEWCAS Conference 2010.

[7]  Hao Wang,et al.  Workload and power budget partitioning for single-chip heterogeneous processors , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[8]  Sara Vinco,et al.  SystemC simulation on GP-GPUs: CUDA vs. OpenCL , 2012, CODES+ISSS '12.

[9]  Osman S. Unsal,et al.  System-level power estimation tool for embedded processor based platforms , 2014, RAPIDO '14.

[10]  Jung Ho Ahn,et al.  McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[11]  Jean-Luc Dekeyser,et al.  Fast and accurate hybrid power estimation methodology for embedded systems , 2011, Proceedings of the 2011 Conference on Design & Architectures for Signal & Image Processing (DASIP).