A Clockless Computing System Based on the Static Dataflow Paradigm

The ambitious challenges posed by next exascale computing systems may require a critical re-examination of both architecture design and consolidated wisdom in terms of programming style and execution model, because such systems are expected to be constituted by thousands of processors with thousands of cores per chip. But how to build exascale architectures remains an open question.This paper presents a novel computing system based on a configurable architecture and a static dataflow execution model. We assume that the basic computational unit is constituted by a dataflow graph. Each processing node is constituted by an ad hoc kernel processor - designed to manage and schedule dataflow graphs, and a manycore dataflow execution engine - designed to execute such dataflow graphs.The main components of the dataflow execution engine are the Dataflow Actor Cores (DACs), which are small, identical and configurable. The major contributions of this paper are: i) the introduction of a machine language (named D#) which represents the low-level static configuration information of the system; ii) the introduction of a self-scheduled clockless mechanism to start operations on the presence of validity tokens only; iii) a design that avoids the need of temporary storage for tokens on the links of the DACs.Our preliminary tests on FPGA-based hardware show the feasibility of this approach.

[1]  John Cocke,et al.  Configurable computers: a new class of general purpose machines , 1972, International Sympoisum on Theoretical Programming.

[2]  Lorenzo Verdoscia,et al.  A High-Level Dataflow System , 1998, Computing.

[3]  Avi Mendelson,et al.  Architectural Support for Fault Tolerance in a Teradevice Dataflow System , 2014, International Journal of Parallel Programming.

[4]  Roberto Giorgi,et al.  Implementing Fine/Medium Grained TLP Support in a Many-Core Architecture , 2009, SAMOS.

[5]  Lorenzo Verdoscia,et al.  D3AS project: a different approach to the manycore challenges , 2012, CF '12.

[6]  L. Verdoscia,et al.  CODACS prototype: CHIARA language and its compiler , 2004, 24th International Conference on Distributed Computing Systems Workshops, 2004. Proceedings..

[7]  Roberto Giorgi TERAFLUX: exploiting dataflow parallelism in teradevices , 2012, CF '12.

[8]  Steven Swanson,et al.  The WaveScalar architecture , 2007, TOCS.

[9]  Ryuzo Hasegawa,et al.  Dataflow computing and eager and lazy evaluations , 1984, New Generation Computing.

[10]  Alejandro Duran,et al.  Extending the OpenMP Tasking Model to Allow Dependent Tasks , 2008, IWOMP.

[11]  Mateo Valero,et al.  Moving from petaflops to petadata , 2013, CACM.

[12]  Edward A. Lee The problem with threads , 2006, Computer.

[13]  Luciano Lavagno,et al.  Realistic performance-constrained pipelining in high-level synthesis , 2011, 2011 Design, Automation & Test in Europe.

[14]  ANTONIO CARLOS FERNANDES,et al.  The ChipCflow Project to Accelerate Algorithms using a Dataflow Graph in a Reconfigurable System , 2012 .

[15]  John Shalf,et al.  Exascale Computing Technology Challenges , 2010, VECPAR.

[16]  T. Abdelrahman,et al.  A Coarse-Grain FPGA Overlay for Executing Data Flow Graphs , 2012 .

[17]  Vadim E. Kotov,et al.  Algorithms, Software and Hardware of Parallel Computers , 1984, Springer Berlin Heidelberg.

[18]  Samuel H. Fuller,et al.  Computing Performance: Game Over or Next Level? , 2011, Computer.

[19]  Arvind,et al.  A computer capable of exchanging processing elements for time , 1976 .

[20]  Lorenzo Verdoscia,et al.  Position Paper: Validity of the Static Dataflow Approach for Exascale Computing Challenges , 2013, 2013 Data-Flow Execution Models for Extreme Scale Computing.

[21]  Richard P. Hopkins,et al.  Data-Driven and Demand-Driven Computer Architecture , 1982, CSUR.

[22]  Avi Mendelson,et al.  TERAFLUX: Harnessing dataflow in next generation teradevices , 2014, Microprocess. Microsystems.

[23]  Richard F. Barrett,et al.  Achieving Exascale Computing through Hardware/Software Co-design , 2011, EuroMPI.

[24]  Jack B. Dennis,et al.  Data flow schemas , 1972, International Sympoisum on Theoretical Programming.

[25]  Lizy Kurian John,et al.  Scaling to the end of silicon with EDGE architectures , 2004, Computer.

[26]  Suhas S. Patil,et al.  Closure properties of interconnections of determinate systems , 1970, Project MAC Conference on Concurrent Systems and Parallel Computation.

[27]  Gerald Estrin,et al.  Reconfigurable Computer Origins: The UCLA Fixed-Plus-Variable (F+V) Structure Computer , 2002, IEEE Ann. Hist. Comput..