A compiler framework for automatically mapping data parallel programs to heterogeneous MPSoCs

Many of today's embedded devices are based on MultiProcessor System-on-Chips(MPSoCs) Such devices are usually heterogeneous, containing DSPs and specialized accelerators as well as one or more CPUs. This heterogeneity allows efficient implementations in specialized domains but is a barrier to their wider use. They are difficult to program as only the CPU is directly exposed to the programmer with access to other resources restricted to narrow library interfaces. This paper enables the exploitation of heterogeneous resources from a high level parallel programming model. It presents an LLVM based compiler that maps OpenMP programs to the underlying heterogeneous cores using an SPMD model of computation. It partitions data and computation across the cores, managing synchronization and memory coherence across different memory domains and operating systems. We evaluate its performance on the OMAP4 MPSoC on a range of data parallel benchmarks. On average it gives a 2.75x speedup over using the low-level library approach. Further-more, it gives a speedup of 1.38x and an improved energy efficiency of 1.4x over using the two A9 cores alone.

[1]  Katherine Yelick,et al.  Introduction to UPC and Language Specification , 2000 .

[2]  Michael F. P. O'Boyle,et al.  A Compiler Strategy for Shared Virtual Memories , 1996 .

[3]  Zhen Wang,et al.  Reflex: using low-power processors in smartphones without knowing them , 2012, ASPLOS XVII.

[4]  Alexander V. Veidenbaum,et al.  A Compiler-Assisted Cache Coherence Solution for Multiprcessors , 1986, ICPP.

[5]  Pen-Chung Yew,et al.  A compiler-directed cache coherence scheme with improved intertask locality , 1994, Proceedings of Supercomputing '94.

[6]  Anant Agarwal,et al.  Automatic Partitioning of Parallel Loops and Data Arrays for Distributed Shared-Memory Multiprocessors , 1995, IEEE Trans. Parallel Distributed Syst..

[7]  Barbara M. Chapman,et al.  Analyses for the Translation of OpenMP Codes into SPMD Style with Array Privatization , 2003, WOMPAT.

[8]  Avi Mendelson,et al.  Programming model for a heterogeneous x86 platform , 2009, PLDI '09.

[9]  Anne Rogers,et al.  Process decomposition through locality of reference , 1989, PLDI '89.

[10]  Jingke Li,et al.  Index domain alignment: minimizing cost of cross-referencing between distributed arrays , 1990, [1990 Proceedings] The Third Symposium on the Frontiers of Massively Parallel Computation.

[11]  Apala Guha,et al.  Balancing memory and performance through selective flushing of software code caches , 2010, CASES '10.

[12]  Michael F. P. O'Boyle,et al.  Partitioning data-parallel programs for heterogeneous MPSoCs: time and energy design space exploration , 2014, LCTES '14.

[13]  Veljko M. Milutinovic,et al.  Classifying Software-Based Cache Coherence Solutions , 1997, IEEE Softw..

[14]  Michael F. P. O'Boyle,et al.  Synchronization Minimization in a SPMD Execution Model , 1995, J. Parallel Distributed Comput..

[15]  Jesús Labarta,et al.  A dependency-aware task-based programming environment for multi-core architectures , 2008, 2008 IEEE International Conference on Cluster Computing.

[16]  Rainer Leupers,et al.  A compiler infrastructure for embedded heterogeneous MPSoCs , 2013, PMAM '13.

[17]  Eduard Ayguadé,et al.  Self-Adaptive OmpSs Tasks in Heterogeneous Environments , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.