On run-time exploitation of concurrency

The `free' speed-up stemming from ever increasing processor speed is over. Performance increase in computer systems can now only be achieved through parallelism. One of the biggest challenges in computer science is how to map applications onto parallel computers. Concurrency, seen as the set of valid traces through a program, is utilized by translating it into actual parallelism, i.e. into the simultaneous execution of multiple computations. With higher degrees of unpredictability---both with regards to the actual workload and to the availability of resources---more can be gained from making scheduling and resource management decisions at run-time, when more information (such as resource availability and required QoS level) is available. In cases where concurrency is data-dependent, programming models and their supporting run-time systems also benefit from exposing concurrency when that data is known, viz. at run-time. In this thesis, two systems for run-time exploitation of concurrency are discussed. The first system discussed in this thesis is an on-line spatial resource manager for real-time streaming applications, especially in energy constrained environments. In embedded systems, these applications typically require QoS guarantees, are structurally stable (do not change over time) and are active for a (relatively) long period of time. With increasing complexity, embedded systems consist increasingly of many independent processors with varying degrees of specialization. Designing systems in such a way is beneficial for flexibility, yield increase and energy conservation. However, exploiting such a heterogeneous multi-processor system in order to realize these benefits requires that the resources it provides are dynamically assigned to applications. A formal and precise definition of this on-line spatial resource management problem is given in this thesis and qualitative evaluation criteria by which on-line spatial resource managers can be compared are introduced. Constraints on applications and techniques for their modelling are discussed. Since the complexity of this problem is prohibitive and the time constraints to make choices are tight, a heuristic approach is introduced. In this approach, the complete problem of spatial resource management is partitioned into the subproblems of binding, mapping, routing, and QoS validation. The subproblems are ordered in the sense that choices made for the solutions to earlier subproblems are considered fixed when solving later subproblems. Since the subproblems still have a high complexity, algorithms and approaches from literature are adapted to partition them further. The adapted algorithms are implemented in Kairos, a proof-of-concept on-line spatial resource manager for heterogeneous multi-processor systems. A large use case, taken from a state-of-the-art industrial application, is used to explore Kairos' capabilities. With this use case and a with synthetic benchmark, Kairos is shown to be a successful proof-of-concept implementation for on-line spatial resource management and, thus, the problem is shown to be solvable with acceptable concessions. The second system discussed in this thesis deals with applications for which it is hard or even impossible to predict their behaviour to the extent that is necessary to fulfil real-time requirements. In particular, this holds for applications for which the amount of concurrency is highly data-dependent and the work done by different tasks in an application is unbalanced, variable and unpredictable. For these applications, performance can not be guaranteed, but by exposing (data-dependent) concurrency at run-time, an application's performance and the total system's utilization can be improved. The system discussed here is SNet. It is developed at the University of Hertfordshire and comprises a coordination language, a programming model and a run-time system. A great strength of SNet is that it allows for the separation of concerns between application engineering and concurrency engineering. The application engineer does not program individual threads with their synchronization and communication, but decomposes the application into small units of work on a stream of input data. In this thesis, a denotational semantics for SNet is presented with proof that under those semantics, SNet is prefix monotonic, i.e. for every finite prefix of the input stream, a prefix of the output stream exists that is unchanged by further input. Furthermore, a novel execution model is presented that exposes significantly more concurrency than the former execution model. A strong indication is given that a schedule exists, such that the novel execution model does not introduce non-termination.

[1]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[2]  Raphael Yuster,et al.  A (1-1/e)-approximation algorithm for the generalized assignment problem , 2006, Oper. Res. Lett..

[3]  Sander Stuijk,et al.  Throughput Analysis of Synchronous Data Flow Graphs , 2006, Sixth International Conference on Application of Concurrency to System Design (ACSD'06).

[4]  Alexander V. Shafarenko,et al.  A Gentle Introduction to S-Net: Typed Stream Processing and Declarative Coordination of Asynchronous Components , 2008, Parallel Process. Lett..

[5]  Luca Benini,et al.  Networks on chip: a new paradigm for systems on chip design , 2002, Proceedings 2002 Design, Automation and Test in Europe Conference and Exhibition.

[6]  Diederik Verkest,et al.  Low cost task migration initiation in a heterogeneous MP-SoC , 2005, Design, Automation and Test in Europe.

[7]  J.A. Stankovic,et al.  Misconceptions about real-time computing: a serious problem for next-generation systems , 1988, Computer.

[8]  Hans G. Kerkhoff,et al.  Built-In Self-Diagnostics for a NoC-Based Reconfigurable IC for Dependable Beamforming Applications , 2008, 2008 IEEE International Symposium on Defect and Fault Tolerance of VLSI Systems.

[9]  Richard Bellman,et al.  ON A ROUTING PROBLEM , 1958 .

[10]  P.T. Wolkotte,et al.  Energy Model of Networks-on-Chip and a Bus , 2005, 2005 International Symposium on System-on-Chip.

[11]  André B. J. Kokkeler,et al.  Multi-core architectures and streaming applications , 2008, SLIP '08.

[12]  Gérard Berry,et al.  The Esterel Synchronous Programming Language: Design, Semantics, Implementation , 1992, Sci. Comput. Program..

[13]  Saurabh Dighe,et al.  An 80-Tile 1.28TFLOPS Network-on-Chip in 65nm CMOS , 2007, 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[14]  Gerard J. M. Smit,et al.  A virtual channel router for on-chip networks , 2004, IEEE International SOC Conference, 2004. Proceedings..

[15]  Kees G. W. Goossens,et al.  A unified approach to constrained mapping and routing on network-on-chip architectures , 2005, 2005 Third IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS'05).

[16]  Sander Stuijk,et al.  Latency Minimization for Synchronous Data Flow Graphs , 2007 .

[17]  Christopher Strachey,et al.  Toward a mathematical semantics for computer languages , 1971 .

[18]  Maarten Wiggers,et al.  Aperiodic multiprocessor scheduling for real-time stream processing applications , 2009 .

[19]  A. vanWijngaarden Recursive definition of syntax and semantics : (proceedings ifip working conference on formal language description languages, vienna 1966, p 13-24) , 1966 .

[20]  Anany Levitin,et al.  Do we teach the right algorithm design techniques? , 1999, SIGCSE '99.

[21]  Kees G. W. Goossens,et al.  Trade-offs in the Configuration of a Network on Chip for Multiple Use-Cases , 2007, First International Symposium on Networks-on-Chip (NOCS'07).

[22]  Bruce Jacob,et al.  The Performance and Energy Consumption of Embedded Real-Time Operating Systems , 2003, IEEE Trans. Computers.

[23]  Jörg Henkel,et al.  ADAM: Run-time agent-based distributed application mapping for on-chip communication , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[24]  Chantal Ykman-Couvreur,et al.  Design-time application mapping and platform exploration for MP-SoC customised run-time management , 2007, IET Comput. Digit. Tech..

[25]  Marc Despontin,et al.  Multiple Criteria Optimization: Theory, Computation, and Application, Ralph E. Steuer (Ed.). Wiley, Palo Alto, CA (1986) , 1987 .

[26]  Peyton Jones,et al.  Haskell 98 language and libraries : the revised report , 2003 .

[27]  Kees G. W. Goossens,et al.  Enabling application-level performance guarantees in network-based systems on chip by applying dataflow analysis , 2009, IET Comput. Digit. Tech..

[28]  Gerard J. M. Smit,et al.  Providing QoS Guarantees in a NoC by Virtual Channel Reservation , 2006, ARC.

[29]  E.A. Lee,et al.  Synchronous data flow , 1987, Proceedings of the IEEE.

[30]  Kees G. W. Goossens,et al.  CoMPSoC: A template for composable and predictable multi-processor system on chips , 2009, TODE.

[31]  Leon Gommans,et al.  Seamless live migration of virtual machines over the MAN/WAN , 2006, Future Gener. Comput. Syst..

[32]  坂井 利之,et al.  IFIP Congress 74 , 1974 .

[33]  Reuven Cohen,et al.  An efficient approximation for the Generalized Assignment Problem , 2006, Inf. Process. Lett..

[34]  William J. Dally,et al.  Stream Processors: Progammability and Efficiency , 2004, ACM Queue.

[35]  Pascal Theodoor Wolkotte,et al.  Exploration within the Network-on-Chip Paradigm , 2009 .

[36]  J. W. Backus,et al.  The FORTRAN automatic coding system , 1899, IRE-AIEE-ACM '57 (Western).

[37]  Orlando Moreira,et al.  Self-Timed Scheduling Analysis for Real-Time Applications , 2007, EURASIP J. Adv. Signal Process..

[38]  William J. Dally,et al.  Principles and Practices of Interconnection Networks , 2004 .

[39]  Gerard J. M. Smit,et al.  Buffer Capacity Computation for Throughput Constrained Streaming Applications with Data-Dependent Inter-Task Communication , 2008, 2008 IEEE Real-Time and Embedded Technology and Applications Symposium.

[40]  Théodore Marescaux,et al.  Centralized run-time resource management in a network-on-chip containing reconfigurable hardware tiles , 2005, Design, Automation and Test in Europe.

[41]  L. Carro,et al.  Time and energy efficient mapping of embedded applications onto NoCs , 2005, Proceedings of the ASP-DAC 2005. Asia and South Pacific Design Automation Conference, 2005..

[42]  Clemens Grelck,et al.  Extending the S-Net Type System. , 2007 .

[43]  Ashish Sharma,et al.  Dynamic mapping in a heterogeneous environment with tasks having priorities and multiple deadlines , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[44]  Ed F. Deprettere,et al.  Daedalus: Toward composable multimedia MP-SoC design , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[45]  Krishnan Srinivasan,et al.  A technique for low energy mapping and routing in network-on-chip architectures , 2005, ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005..

[46]  Samuel Williams,et al.  The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .

[47]  Keqin Li,et al.  A Two-Dimensional Buddy System for Dynamic Resource Allocation in a Partitionable Mesh Connected System , 1991, J. Parallel Distributed Comput..

[48]  Kees Goossens,et al.  AEthereal network on chip: concepts, architectures, and implementations , 2005, IEEE Design & Test of Computers.

[49]  H. Peter Hofstee,et al.  Introduction to the Cell multiprocessor , 2005, IBM J. Res. Dev..

[50]  M. Skolnik,et al.  Introduction to Radar Systems , 2021, Advances in Adaptive Radar Detection and Range Estimation.

[51]  Fernando Gehm Moraes,et al.  Heuristics for Dynamic Task Mapping in NoC-based Heterogeneous MPSoCs , 2007, 18th IEEE/IFIP International Workshop on Rapid System Prototyping (RSP '07).

[52]  Gerard J. M. Smit,et al.  Modelling run-time arbitration by latency-rate servers in dataflow graphs , 2007, SCOPES '07.

[53]  Josef Stoer,et al.  Numerische Mathematik 1 , 1989 .

[54]  Christopher Strachey,et al.  Continuations: A Mathematical Semantics for Handling Full Jumps , 2000, High. Order Symb. Comput..

[55]  Kees Goossens,et al.  Applying Dataflow Analysis to Dimension Buffers for Guaranteed Performance in Networks on Chip , 2008 .

[56]  Nils J. Nilsson,et al.  A Formal Basis for the Heuristic Determination of Minimum Cost Paths , 1968, IEEE Trans. Syst. Sci. Cybern..

[57]  Orlando Moreira,et al.  Online resource management in a multiprocessor with a network-on-chip , 2007, SAC '07.

[58]  Werner B. Joerg A subclass of Petri Nets as design abstraction for parallel architectures , 1990, CARN.

[59]  Anujan Varma,et al.  Latency-rate servers: a general model for analysis of traffic scheduling algorithms , 1996, Proceedings of IEEE INFOCOM '96. Conference on Computer Communications.

[60]  Gerard J. M. Smit,et al.  Communication between nested loop programs via circular buffers in an embedded multiprocessor system , 2008, SCOPES '08.

[61]  Vikram S. Adve,et al.  LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[62]  Clemens Grelck,et al.  Implementation Architecture and Multithreaded Runtime System of S-Net , 2008, IFL.

[63]  Todor Stefanov,et al.  pn: A Tool for Improved Derivation of Process Networks , 2007, EURASIP J. Embed. Syst..

[64]  Gerard J. M. Smit,et al.  Computation of Buffer Capacities for Throughput Constrained and Data Dependent Inter-Task Communication , 2008, 2008 Design, Automation and Test in Europe.

[65]  Yajun Ha,et al.  Resource Manager for Non-preemptive Heterogeneous Multiprocessor System-on-chip , 2006, 2006 IEEE/ACM/IFIP Workshop on Embedded Systems for Real Time Multimedia.

[66]  Santosh Pande,et al.  Minimizing downtime in seamless migrations of mobile applications , 2006, LCTES.

[67]  Radu Marculescu,et al.  Incremental run-time application mapping for homogeneous NoCs with multiple voltage levels , 2007, 2007 5th IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[68]  Nikolay Kavaldjiev,et al.  A run-time reconfigurable Network-on-Chip for streaming DSP applications , 2006 .

[69]  Wayne H. Wolf,et al.  TGFF: task graphs for free , 1998, Proceedings of the Sixth International Workshop on Hardware/Software Codesign. (CODES/CASHE'98).

[70]  Théodore Marescaux,et al.  Introducing the SuperGT Network-on-Chip; SuperGT QoS: more than just GT , 2007, 2007 44th ACM/IEEE Design Automation Conference.

[71]  H. Corporaal,et al.  Fast Multi-Dimension Multi-Choice Knapsack Heuristic for MP-SoC Run-Time Management , 2006, 2006 International Symposium on System-on-Chip.

[72]  C. Petri Kommunikation mit Automaten , 1962 .

[73]  Rina Dechter,et al.  Generalized best-first search strategies and the optimality of A* , 1985, JACM.

[74]  Clemens Grelck,et al.  Distributed S-Net , 2011 .

[75]  Ramjee Prasad,et al.  An overview of air interface multiple access for IMT-2000/UMTS , 1998, IEEE Commun. Mag..

[76]  Charles U. Martel,et al.  On non-preemptive scheduling of period and sporadic tasks , 1991, [1991] Proceedings Twelfth Real-Time Systems Symposium.

[77]  Edsger W. Dijkstra,et al.  A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[78]  Paolo Toth,et al.  Knapsack Problems: Algorithms and Computer Implementations , 1990 .

[79]  Gernot Heiser,et al.  The role of virtualization in embedded systems , 2008, IIES '08.

[80]  Luca Benini,et al.  Networks on Chips : A New SoC Paradigm , 2022 .

[81]  Venkatesan Guruswami,et al.  Near-optimal hardness results and approximation algorithms for edge-disjoint paths and related problems , 2003, J. Comput. Syst. Sci..

[82]  Giorgio C. Buttazzo,et al.  HARD REAL-TIME COMPUTING SYSTEMS Predictable Scheduling Algorithms and Applications , 2007 .