A Dynamic Contention-aware Application Allocation Algorithm for Many-core Processor

Concurrently executing diverse independent applications on a many-core processor with hundreds of cores requests allocating application tasks to minimize communication contention and communication cost. In this paper, we propose a novel application allocation algorithm to assign applications onto a many-core processor with considering communications between tasks and contentions on network channels. Our dynamic contention-aware application allocation (DC3A) algorithm focuses on reducing both external/internal communication contentions and communication cost on network by adopting a novel edgecentric method to delicately arrange positions of tasks of an application to form a specific rectangular mapping and an efficient method to select a rectangular resource region composed of available cores to allocate the application based on the mapping. In order to evaluate DC3A, we have implemented new thread spawning/joining modules and multi-application synchronization modules in Graphite simulator. The simulation results of DC3A and peer algorithms illustrate that with the increase of communication density, DC3A can better optimize the network performance. We have observed a reduction of average packet latency (APL) at most up to 35.6%, 32.6% and 24.6% when compared with first free (FF) algorithm, nearest neighbour (NN) algorithm and contiguous neighborhood allocation (CoNA) algorithm, respectively.

[1]  Amit Kumar Singh,et al.  Mapping on multi/many-core systems: Survey of current and emerging trends , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[2]  Jörg Henkel,et al.  ADAM: Run-time agent-based distributed application mapping for on-chip communication , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[3]  Radu Marculescu,et al.  Incremental run-time application mapping for homogeneous NoCs with multiple voltage levels , 2007, 2007 5th IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[4]  Radu Marculescu,et al.  Run-Time Task Allocation Considering User Behavior in Embedded Multiprocessor Networks-on-Chip , 2010, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[5]  Stijn Eyerman,et al.  System-Level Performance Metrics for Multiprogram Workloads , 2008, IEEE Micro.

[6]  Fernando Gehm Moraes,et al.  Heuristics for Dynamic Task Mapping in NoC-based Heterogeneous MPSoCs , 2007, 18th IEEE/IFIP International Workshop on Rapid System Prototyping (RSP '07).

[7]  Fernando Gehm Moraes,et al.  Congestion-Aware Task Mapping in NoC-based MPSoCs with Dynamic Workload , 2007, IEEE Computer Society Annual Symposium on VLSI (ISVLSI '07).

[8]  Luca Benini,et al.  Networks on chip: a new paradigm for systems on chip design , 2002, Proceedings 2002 Design, Automation and Test in Europe Conference and Exhibition.

[9]  Radu Marculescu,et al.  User-Aware Dynamic Task Allocation in Networks-on-Chip , 2008, 2008 Design, Automation and Test in Europe.

[10]  Sriram R. Vangal,et al.  A 5-GHz Mesh Interconnect for a Teraflops Processor , 2007, IEEE Micro.

[11]  Coniferous softwood GENERAL TERMS , 2003 .

[12]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[13]  George Kurian,et al.  Graphite: A distributed parallel simulator for multicores , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[14]  Shekhar Borkar Thousand Core ChipsA Technology Perspective , 2007, DAC 2007.

[15]  Reetuparna Das,et al.  Application-to-core mapping policies to reduce memory system interference in multi-core systems , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[16]  Pasi Liljeberg,et al.  CoNA: Dynamic application mapping for congestion reduction in many-core systems , 2012, 2012 IEEE 30th International Conference on Computer Design (ICCD).

[17]  Vincenzo Catania,et al.  Low Energy Mapping Techniques under Reliability and Bandwidth Constraints , 2013, 2013 IEEE 10th International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing.