Low Contention Mapping of Real-Time Tasks onto TilePro 64 Core Processors

Predictability of task execution is paramount for real-time systems so that upper bounds of execution times can be determined via static timing analysis. Static timing analysis on network-on-chip (NoC) processors may result in unsafe underestimations when the underlying communication paths are not considered. This stems from contention on the underlying network when data from multiple sources share parts of a routing path in the NoC. Contention analysis must be performed to provide safe and reliable bounds. In addition, the overhead incurred by contention due to inter-process communication (IPC) can be reduced by mapping tasks to cores in such a way that contention is minimized. This paper makes several contributions to increase pre-predictability of real-time tasks on NoC architectures. First, we contribute a constraint solver that exhaustively maps real-time tasks onto cores to minimize contention and improve predictability. Second, we develop a novel TDMA-like approach to map communication traces into time frames to ensure separation of analysis for temporally disjoint communication. Third, we contribute a novel multi-heuristic approximation, H Solver, for rapid discovery of low contention solutions. H Solver reduces contention by up to 70% when compared with naive and constrained exhaustive solutions. We evaluate our experiments using a micro-benchmark of task system IPC on the TilePro64, a real, physical NoC processor with 64 cores. To the best of our knowledge, this is the first work to consider IPC for worst-case time frames to simplify analysis and to measure the impact on actual hardware for NoC-based real-time multi core systems.

[1]  Srinivasan Murali,et al.  Bandwidth-constrained mapping of cores onto NoC architectures , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.

[2]  Karthikeyan Sankaralingam,et al.  Dark Silicon and the End of Multicore Scaling , 2012, IEEE Micro.

[3]  Krzysztof Kuchcinski,et al.  Constraints-driven scheduling and resource assignment , 2003, TODE.

[4]  Armin Bender MILP based task mapping for heterogeneous multiprocessor systems , 1996, Proceedings EURO-DAC '96. European Design Automation Conference with EURO-VHDL '96 and Exhibition.

[5]  Axel Jantsch,et al.  Constrained global scheduling of streaming applications on MPSoCs , 2010, 2010 15th Asia and South Pacific Design Automation Conference (ASP-DAC).

[6]  Steven Swanson,et al.  Conservation cores: reducing the energy of mature computations , 2010, ASPLOS XV.

[7]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[8]  Frank Mueller,et al.  Bounding worst-case data cache behavior by analytically deriving cache reference patterns , 2005, 11th IEEE Real Time and Embedded Technology and Applications Symposium.

[9]  Vikram Bhatt,et al.  The GreenDroid Mobile Application Processor: An Architecture for Silicon's Dark Future , 2011, IEEE Micro.

[10]  Radu Marculescu,et al.  Energy- and performance-aware mapping for regular NoC architectures , 2005, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[11]  Adrian Schüpbach,et al.  The multikernel: a new OS architecture for scalable multicore systems , 2009, SOSP '09.

[12]  Kees Goossens,et al.  AEthereal network on chip: concepts, architectures, and implementations , 2005, IEEE Design & Test of Computers.

[13]  Henry Hoffmann,et al.  On-Chip Interconnection Architecture of the Tile Processor , 2007, IEEE Micro.

[14]  David H. Bailey,et al.  The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[15]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[16]  Gianluca Palermo,et al.  Exploration of Distributed Shared Memory Architectures for NoC-based Multiprocessors , 2006, 2006 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation.

[17]  Radu Marculescu,et al.  Contention-aware application mapping for Network-on-Chip communication architectures , 2008, 2008 IEEE International Conference on Computer Design.

[18]  Gianluca Palermo,et al.  Exploration of distributed shared memory architectures for NoC-based multiprocessors , 2007, J. Syst. Archit..

[19]  Stephen W. Keckler,et al.  Segment gating for static energy reduction in networks-on-chip , 2009, 2009 2nd International Workshop on Network on Chip Architectures.

[20]  Yeo-Chan Yoon,et al.  Communication-aware task assignment algorithm for MPSoC using shared memory , 2010, J. Syst. Archit..

[21]  Andrew E. Johnson,et al.  Investigation of the Tilera processor for real time hazard detection and avoidance on the Altair Lunar Lander , 2010, 2010 IEEE Aerospace Conference.

[22]  Sander Stuijk,et al.  Multiprocessor Resource Allocation for Throughput-Constrained Synchronous Dataflow Graphs , 2007, 2007 44th ACM/IEEE Design Automation Conference.

[23]  Witold Lipski,et al.  An O(n log n) Manhattan Path Algorithm , 1984, Inf. Process. Lett..