Efficient Reconfigurable Global Network-on-Chip Designs towards Heterogeneous CPU-GPU Systems: An Application-Aware Approach

Different applications require different communication performance between subnets in a global hybrid network-on-chip (NOC) of a heterogeneous CPU-GPU architecture (HSA). It is impractical to deploy (at design time) or switch-on (at runtime) all the hybrid routers in the network for a certain application that needs several hybrid routers for communication. Reconfiguring the customized global hybrid NOC is important because the cost of deploying/powering these hybrid routers will be reduced significantly when the network scales up. Hence, applying optimization is feasible on this problem.We consider the problem of optimizing the quantity of the utilized hybrid routers in the global hybrid NOC, when the applications and network configurations are known. This problem can be cast as a mixed-integer linear programming which is known to be NP-Hard in general.We propose a prediction model of estimating the near-optimal amount of the utilized hybrid routers with a quick time. Our evaluation shows that the solution time of the prediction model outperforms that of the conventional model by 99 percent on average. The models also saved up to 84 percent on average in terms of the router utilization, compared to without using the models. We validated our estimated results by simulating them in HSA to prove the efficient performance.

[1]  Krishnan Srinivasan,et al.  Linear programming based techniques for synthesis of network-on-chip architectures , 2006, IEEE International Conference on Computer Design: VLSI in Computers and Processors, 2004. ICCD 2004. Proceedings..

[2]  Jinchun Kim,et al.  Bandwidth-efficient on-chip interconnect designs for GPGPUs , 2015, 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[3]  Natalie D. Enright Jerger,et al.  Achieving predictable performance through better memory controller placement in many-core CMPs , 2009, ISCA '09.

[4]  Chita R. Das,et al.  A heterogeneous multiple network-on-chip design: An application-aware approach , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[5]  Radu Marculescu,et al.  Hybrid network-on-chip architectures for accelerating deep learning kernels on heterogeneous manycore platforms , 2016, 2016 International Conference on Compliers, Architectures, and Sythesis of Embedded Systems (CASES).

[6]  Hongyi Wu,et al.  I(Re)2-WiNoC: Exploring scalable wireless on-chip micronetworks for heterogeneous embedded many-core SoCs , 2015 .

[7]  David A. Wood,et al.  GPU Computing Pipeline Inefficiencies and Optimization Opportunities in Heterogeneous CPU-GPU Processors , 2015, 2015 IEEE International Symposium on Workload Characterization.

[8]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[9]  Hongyi Wu,et al.  Task-resource co-allocation for hotspot minimization in heterogeneous many-core NoCs , 2016, 2016 International Great Lakes Symposium on VLSI (GLSVLSI).

[10]  David R. Kaeli,et al.  Multi2Sim: A simulation framework for CPU-GPU computing , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[11]  David A. Wood,et al.  Optimization and Mathematical Modeling in Computer Architecture , 2013, Optimization and Mathematical Modeling in Computer Architecture.

[12]  David R. Kaeli,et al.  UMH , 2016, ACM Trans. Archit. Code Optim..

[13]  David A. Wood,et al.  Optimization Models for Three On-Chip Network Problems , 2016, ACM Trans. Archit. Code Optim..

[14]  Mahmut T. Kandemir,et al.  Managing GPU Concurrency in Heterogeneous Architectures , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[15]  Chita R. Das,et al.  A case for heterogeneous on-chip interconnects for CMPs , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[16]  Karthikeyan Sankaralingam,et al.  Analyzing Behavior Specialized Acceleration , 2016, ASPLOS.

[17]  Eduard Ayguadé,et al.  Evaluating the effect of last-level cache sharing on integrated GPU-CPU systems with heterogeneous applications , 2016, 2016 IEEE International Symposium on Workload Characterization (IISWC).

[18]  Jason Cong,et al.  Architecture Support for Domain-Specific Accelerator-Rich CMPs , 2014, ACM Trans. Embed. Comput. Syst..