论文信息 - A comprehensive methodology to determine optimal coherence interfaces for many-accelerator SoCs

A comprehensive methodology to determine optimal coherence interfaces for many-accelerator SoCs

Modern systems-on-chip (SoCs) include not only general-purpose CPUs but also specialized hardware accelerators. Typically, there are three coherence model choices to integrate an accelerator with the memory hierarchy: no coherence, coherent with the last-level cache (LLC), and private cache based full coherence. However, there has been very limited research on finding which coherence models are optimal for the accelerators of a complex many-accelerator SoC. This paper focuses on determining a cost-aware coherence interface for an SoC and its target application: find the best coherence models for the accelerators that optimize their power and performance, considering both workload characteristics and system-level contention. A novel comprehensive methodology is proposed that uses Bayesian optimization to efficiently find the cost-aware coherence interfaces for SoCs that are modeled using the gem5-Aladdin architectural simulator. For a complete analysis, gem5-Aladdin is extended to support LLC coherence in addition to already-supported no coherence and full coherence. For a heterogeneous SoC targeting applications with varying amount of accelerator-level parallelism, the proposed framework rapidly finds cost-aware coherence interfaces that show significant performance and power benefits over the other commonly-used coherence interfaces.

[1] Jasper Snoek,et al. Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[2] Gu-Yeon Wei,et al. Co-designing accelerators and SoC interfaces using gem5-Aladdin , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[3] Nando de Freitas,et al. Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.

[4] Gu-Yeon Wei,et al. MachSuite: Benchmarks for accelerator design and customized architectures , 2014, 2014 IEEE International Symposium on Workload Characterization (IISWC).

[5] Somayeh Sardashti,et al. The gem5 simulator , 2011, CARN.

[6] Gu-Yeon Wei,et al. Aladdin: A pre-RTL, power-performance accelerator simulator enabling large design space exploration of customized architectures , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[7] H. Franke,et al. Introduction to the wire-speed processor and architecture , 2010, IBM J. Res. Dev..

[8] Jason Cong,et al. Accelerator-rich CMPs: From concept to real hardware , 2013, 2013 IEEE 31st International Conference on Computer Design (ICCD).

[9] Carl E. Rasmussen,et al. Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[10] Tulika Mitra,et al. Stitch: Fusible Heterogeneous Accelerators Enmeshed with Many-Core Architecture for Wearables , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[11] Jason Cong,et al. On-chip interconnection network for accelerator-rich architectures , 2015, 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[12] Frédo Durand,et al. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines , 2013, PLDI 2013.

[13] Wolfgang Ponweiser,et al. Multiobjective Optimization on a Limited Budget of Evaluations Using Model-Assisted -Metric Selection , 2008, PPSN.

[14] Luca P. Carloni,et al. Runtime reconfigurable memory hierarchy in embedded scalable platforms , 2019, ASP-DAC.

[15] Luca P. Carloni,et al. Accelerators and Coherence: An SoC Perspective , 2018, IEEE Micro.

[16] Gu-Yeon Wei,et al. Determining Optimal Coherency Interface for Many-Accelerator SoCs Using Bayesian Optimization , 2019, IEEE Computer Architecture Letters.

[17] Sarita V. Adve,et al. Spandex: A Flexible Interface for Efficient Heterogeneous Coherence , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[18] Luca P. Carloni,et al. NoC-Based Support of Heterogeneous Cache-Coherence Models for Accelerators , 2018, 2018 Twelfth IEEE/ACM International Symposium on Networks-on-Chip (NOCS).

[19] Luca P. Carloni,et al. An analysis of accelerator coupling in heterogeneous architectures , 2015, 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC).