Data-Driven Thread Execution on Heterogeneous Processors

In this paper we report our experience in implementing and evaluating the Data-Driven Multithreading (DDM) model on a heterogeneous multi-core processor. DDM is a non-blocking multithreading model that decouples the synchronization from the computation portions of a program, allowing them to execute asynchronously in a dataflow manner. Thread dependencies are determined by the compiler/programmer while thread scheduling is done dynamically at runtime based on data availability. The target processor for this implementation is the Cell processor. We call this implementation the Data-Driven Multithreading Virtual Machine for the Cell processor (DDM-$$\hbox {VM}_c$$VMc). Thread scheduling is handled in software by the Power Processing Element core of the Cell while the Synergistic Processing Element cores execute the program threads. DDM-$$\hbox {VM}_c$$VMc virtualizes the parallel resources of the Cell, handles the heterogeneity of the cores, manages the Cell memory hierarchy efficiently and supports distributed execution across a cluster of Cell nodes. DDM-$$\hbox {VM}_c$$VMc has been implemented on a single Cell processor with six computation cores, as well as, on a four Cell processor cluster with 24 computation cores. We present an in-depth performance analysis of DDM-$$\hbox {VM}_c$$VMc, using a suite of standard computational benchmarks. The evaluation shows that DDM-$$\hbox {VM}_c$$VMc scales well and tolerates scheduling overheads, memory and communication latencies effectively. Furthermore, DDM-$$\hbox {VM}_c$$VMc compares favorably with other platforms targeting the Cell processor, such as, the CellSs and Sequoia.

[1]  Paraskevas Evripidou,et al.  Verilog-based simulation of hardware support for data-flow concurrency on multicore systems , 2013, 2013 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS).

[2]  Paraskevas Evripidou,et al.  Data-Driven Multithreading Using Conventional Microprocessors , 2006, IEEE Transactions on Parallel and Distributed Systems.

[3]  Arvind,et al.  The U-Interpreter , 1982, Computer.

[4]  Paraskevas Evripidou,et al.  Architectural Support for Data-Driven Execution , 2015, ACM Trans. Archit. Code Optim..

[5]  Paraskevas Evripidou,et al.  Combining Compile and Run-Time Dependency Resolution in Data-Driven Multithreading , 2011, 2011 First Workshop on Data-Flow Execution Models for Extreme Scale Computing.

[6]  Jack B. Dennis,et al.  First version of a data flow procedure language , 1974, Symposium on Programming.

[7]  Avi Mendelson,et al.  The TERAFLUX Project: Exploiting the DataFlow Paradigm in Next Generation Teradevices , 2013, 2013 Euromicro Conference on Digital System Design.

[8]  Paraskevas Evripidou,et al.  Programming multi-core architectures using Data-Flow techniques , 2010, 2010 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation.

[9]  Jesús Labarta,et al.  CellSs: Making it easier to program the Cell Broadband Engine processor , 2007, IBM J. Res. Dev..

[10]  Paraskevas Evripidou,et al.  TFlux: A Portable Platform for Data-Driven Multithreading on Commodity Multicore Systems , 2008, 2008 37th International Conference on Parallel Processing.

[11]  Samer Arandi,et al.  The data-driven multithreading virtual machine , 2012 .

[12]  William J. Dally,et al.  Sequoia: Programming the Memory Hierarchy , 2006, International Conference on Software Composition.

[13]  Zain-ul-Abdin,et al.  Kickstarting high-performance energy-efficient manycore architectures with Epiphany , 2014, 2014 48th Asilomar Conference on Signals, Systems and Computers.

[14]  Donald Ross,et al.  Computer-Aided Study of Literary Language , 1978, Computer.

[15]  Rosa M. Badia,et al.  CellSs: a Programming Model for the Cell BE Architecture , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[16]  Paraskevas Evripidou,et al.  CacheFlow: A Short-Term Optimal Cache Management Policy for Data Driven Multithreading , 2004, Euro-Par.

[17]  Norman P. Jouppi,et al.  Heterogeneous chip multiprocessors , 2005, Computer.

[18]  Paraskevas Evripidou,et al.  DDM-VMc: the data-driven multithreading virtual machine for the cell processor , 2011, HiPEAC.

[19]  J.M. Arul,et al.  Scalability of scheduled data flow architecture (SDF) with register contexts , 2002, Fifth International Conference on Algorithms and Architectures for Parallel Processing, 2002. Proceedings..

[20]  Paraskevas Evripidou,et al.  Data-flow Concurrency on Distributed Multi-core Systems , 2013 .

[21]  Vivek Sarkar,et al.  Multi-core Implementations of the Concurrent Collections Programming Model , 2008 .

[22]  Eduard Ayguadé,et al.  Hybrid access-specific software cache techniques for the cell BE architecture , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[23]  I. Waston,et al.  A practical data flow computer , 1982 .

[24]  Michael Gschwind,et al.  Using advanced compiler technology to exploit the performance of the Cell Broadband EngineTM architecture , 2006, IBM Syst. J..

[25]  Vivek Sarkar,et al.  Declarative aspects of memory management in the concurrent collections parallel programming model , 2009, DAMP '09.

[26]  Cédric Augonnet,et al.  Mapping and Synchronizing Streaming Applications on Cell Processors , 2008, HiPEAC.

[27]  Tao Zhang,et al.  Orchestrating data transfer for the cell/B.E. processor , 2008, ICS '08.

[28]  H. Peter Hofstee,et al.  Introduction to the Cell multiprocessor , 2005, IBM J. Res. Dev..

[29]  Roberto Giorgi,et al.  DTA-C: A Decoupled multi-Threaded Architecture for CMP Systems , 2007, 19th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'07).

[30]  P. Evripidou,et al.  FREDDO: an efficient Framework for Runtime Execution of Data-Driven Objects , 2017 .