Temporal analysis and scheduling of hard real-time radios running on a multi-processor

On a multi-radio baseband system, multiple independent transceivers must share the resources of a multi-processor, while meeting each its own hard real-time requirements. Not all possible combinations of transceivers are known at compile time, so a solution must be found that either allows for independent timing analysis or relies on runtime timing analysis. This thesis proposes a design flow and software architecture that meets these challenges, while enabling features such as independent transceiver compilation and dynamic loading, and taking into account other challenges such as ease of programming, efficiency, and ease of validation. We take data flow as the basic model of computation, as it fits the application domain, and several static variants (such as Single-Rate, Multi-Rate and Cyclo-Static) have been shown to possess strong analytical properties. Traditional temporal analysis of data flow can provide minimum throughput guarantees for a self-timed implementation of data flow. Since transceivers may need to guarantee strictly periodic execution and meet latency requirements, we extend the analysis techniques to show that we can enforce strict periodicity for an actor in the graph; we also provide maximum latency analysis techniques for periodic, sporadic and bursty sources. We propose a scheduling strategy and an automatic scheduling flow that enable the simultaneous execution of multiple transceivers with hard-realtime requirements, described as Single-Rate Data Flow (SRDF) graphs. Each transceiver has its own execution rate and starts and stops independently from other transceivers, at times unknown at compile time, on a multiprocessor. We show how to combine scheduling and mapping decisions with the input application data flow graph to generate a worst-case temporal analysis graph. We propose algorithms to find a mapping per transceiver in the form of clusters of statically-ordered actors, and a budget for either a Time Division Multiplex (TDM) or Non-Preemptive Non-Blocking Round Robin (NPNBRR) scheduler per cluster per transceiver. The budget is computed such that if the platform can provide it, then the desired minimum throughput and maximum latency of the transceiver are guaranteed, while minimizing the required processing resources. We illustrate the use of these techniques to map a combination of WLAN and TDS-CDMA receivers onto a prototype Software-Defined Radio platform. The functionality of transceivers for standards with very dynamic behavior – such as WLAN – cannot be conveniently modeled as an SRDF graph, since SRDF is not capable of expressing variations of actor firing rules depending on the values of input data. Because of this, we propose a restricted, customized data flow model of computation, Mode-Controlled Data Flow (MCDF), that can capture the data-value dependent behavior of a transceiver, while allowing rigorous temporal analysis, and tight resource budgeting. We develop a number of analysis techniques to characterize the temporal behavior of MCDF graphs, in terms of maximum latencies and throughput. We also provide an extension to MCDF of our scheduling strategy for SRDF. The capabilities of MCDF are then illustrated with a WLAN 802.11a receiver model. Having computed budgets for each transceiver, we propose a way to use these budgets for run-time resource mapping and admissibility analysis. During run-time, at transceiver start time, the budget for each cluster of statically-ordered actors is allocated by a resource manager to platform resources. The resource manager enforces strict admission control, to restrict transceivers from interfering with each other’s worst-case temporal behaviors. We propose algorithms adapted from Vector Bin-Packing to enable the mapping at start time of transceivers to the multi-processor architecture, considering also the case where the processors are connected by a network on chip with resource reservation guarantees, in which case we also find routing and resource allocation on the network-on-chip. In our experiments, our resource allocation algorithms can keep 95% of the system resources occupied, while suffering from an allocation failure rate of less than 5%. An implementation of the framework was carried out on a prototype board. We present performance and memory utilization figures for this implementation, as they provide insights into the costs of adopting our approach. It turns out that the scheduling and synchronization overhead for an unoptimized implementation with no hardware support for synchronization of the framework is 16.3% of the cycle budget for a WLAN receiver on an EVP processor at 320 MHz. However, this overhead is less than 1% for mobile standards such as TDS-CDMA or LTE, which have lower rates, and thus larger cycle budgets. Considering that clock speeds will increase and that the synchronization primitives can be optimized to exploit the addressing modes available in the EVP, these results are very promising.

[1]  Sander Stuijk,et al.  Throughput Analysis of Synchronous Data Flow Graphs , 2006, Sixth International Conference on Application of Concurrency to System Design (ACSD'06).

[2]  Clifford Stein,et al.  Approximating Disjoint-Path Problems Using Greedy Algorithms and Packing Integer Programs ( Extended Abstract ) , 1998 .

[3]  Raymond Reiter,et al.  Scheduling Parallel Computations , 1968, J. ACM.

[4]  George Markowsky,et al.  Multidimensional Bin Packing Algorithms , 1977, IBM J. Res. Dev..

[5]  Twan Basten,et al.  Simultaneous budget and buffer size computation for throughput-constrained task graphs , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[6]  E.A. Lee,et al.  Synchronous data flow , 1987, Proceedings of the IEEE.

[7]  Anoop Gupta,et al.  Parallel computer architecture - a hardware / software approach , 1998 .

[8]  Orlando Moreira,et al.  A multi-radio SDR technology demonstrator , 2009 .

[9]  Steve Goddard,et al.  Managing Latency and Buffer Requirements in Processing Graph Chains , 2001, Comput. J..

[10]  Edward A. Lee,et al.  Static Scheduling of Synchronous Data Flow Programs for Digital Signal Processing , 1989, IEEE Transactions on Computers.

[11]  Sander Stuijk,et al.  A Predictable Multiprocessor Design Flow for Streaming Applications with Dynamic Behaviour , 2010, DSD.

[12]  Mirko Sauermann,et al.  Mapping the Physical Layer of Radio Standards to Multiprocessor Architectures , 2007, 2007 Design, Automation & Test in Europe Conference & Exhibition.

[13]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[14]  Sander Stuijk,et al.  Throughput-Buffering Trade-Off Exploration for Cyclo-Static and Synchronous Dataflow Graphs , 2008, IEEE Transactions on Computers.

[15]  William Thies,et al.  Language and compiler support for stream programs , 2009 .

[16]  Per Stenström,et al.  Timing anomalies in dynamically scheduled microprocessors , 1999, Proceedings 20th IEEE Real-Time Systems Symposium (Cat. No.99CB37054).

[17]  Edward A. Lee,et al.  Compile-time scheduling of dynamic constructs in dataflow program graphs , 1997 .

[18]  Hermann Kopetz,et al.  Real-time systems , 2018, CSC '73.

[19]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[20]  Théodore Marescaux,et al.  Dynamic time-slot allocation for QoS enabled networks on chip , 2005, 3rd Workshop on Embedded Systems for Real-Time Multimedia, 2005..

[21]  Trevor Mudge,et al.  SPEX: A Programming Language for Software Defined Radio , 2006 .

[22]  Anujan Varma,et al.  Latency-rate servers: a general model for analysis of traffic scheduling algorithms , 1996, Proceedings of IEEE INFOCOM '96. Conference on Computer Communications.

[23]  Jan Karel Lenstra,et al.  Periodic Multiprocessor Scheduling , 1991, PARLE.

[24]  James W. Layland,et al.  Scheduling Algorithms for Multiprogramming in a Hard-Real-Time Environment , 1989, JACM.

[25]  Twan Basten,et al.  Task-level timing models for guaranteed performance in multiprocessor networks-on-chip , 2003, CASES '03.

[26]  Hyunseok Lee,et al.  SODA: A Low-power Architecture For Software Radio , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[27]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[28]  Orlando Moreira,et al.  MULTI-RADIO SCHEDULING AND RESOURCE SHARING ON A SOFTWARE DEFINED RADIO COMPUTING PLATFORM , 2008 .

[29]  Orlando Moreira,et al.  Multiprocessor resource allocation for hard-real-time streaming with a dynamic job-mix , 2005, 11th IEEE Real Time and Embedded Technology and Applications Symposium.

[30]  Sander Stuijk,et al.  Multiprocessor Resource Allocation for Throughput-Constrained Synchronous Dataflow Graphs , 2007, 2007 44th ACM/IEEE Design Automation Conference.

[31]  Peter van der Stok Dynamic and Robust Streaming in and between Connected Consumer-Electronic Devices , 2011 .

[32]  Jens Vygen,et al.  The Book Review Column1 , 2020, SIGACT News.

[33]  Guang R. Gao,et al.  Well-behaved dataflow programs for DSP computation , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[34]  Orlando Moreira,et al.  Scheduling multiple independent hard-real-time jobs on a heterogeneous multiprocessor , 2007, EMSOFT '07.

[35]  Sander Stuijk,et al.  Worst-case performance analysis of Synchronous Dataflow scenarios , 2010, 2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[36]  Stephen P. Boyd,et al.  Disciplined Convex Programming , 2006 .

[37]  Shuvra S. Bhattacharyya,et al.  Embedded Multiprocessors: Scheduling and Synchronization , 2000 .

[38]  Guang R. Gao,et al.  A novel framework of register allocation for software pipelining , 1993, POPL '93.

[39]  Sang Hyuk Son,et al.  New Strategies for Assigning Real-Time Tasks to Multiprocessor Systems , 1995, IEEE Trans. Computers.

[40]  Kees Moerman,et al.  Vector Processing as an Enabler for Software-Defined Radio in Handheld Devices , 2005, EURASIP J. Adv. Signal Process..

[41]  Kees G. W. Goossens,et al.  Networks on silicon: combining best-effort and guaranteed services , 2002, Proceedings 2002 Design, Automation and Test in Europe Conference and Exhibition.

[42]  Maarten Wiggers,et al.  A Priority-Based Budget Scheduler with Conservative Dataflow Model , 2009, 2009 12th Euromicro Conference on Digital System Design, Architectures, Methods and Tools.

[43]  J.L. van Meerbergen,et al.  Heterogeneous multiprocessor for the management of real-time video and graphics streams , 2000, IEEE Journal of Solid-State Circuits.

[44]  Grant Martin,et al.  Winning the SoC Revolution , 2003, Springer US.

[45]  Kees G. W. Goossens,et al.  A unified approach to constrained mapping and routing on network-on-chip architectures , 2005, 2005 Third IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS'05).

[46]  Ali Dasdan,et al.  Experimental analysis of the fastest optimum cycle ratio and mean algorithms , 2004, TODE.

[47]  Kees G. W. Goossens,et al.  Real-Time Scheduling Using Credit-Controlled Static-Priority Arbitration , 2008, 2008 14th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications.

[48]  J. Quadrat,et al.  Numerical Computation of Spectral Elements in Max-Plus Algebra☆ , 1998 .

[49]  Brian W. Kernighan,et al.  An efficient heuristic procedure for partitioning graphs , 1970, Bell Syst. Tech. J..

[50]  Orlando Moreira,et al.  A novel approach to minimising the logic of combinatorial multiplexing circuits in product-term-based hardware , 2000, Proceedings of the 26th Euromicro Conference. EUROMICRO 2000. Informatics: Inventing the Future.

[51]  Edsger W. Dijkstra,et al.  Hierarchical ordering of sequential processes , 1971, Acta Informatica.

[52]  Kang G. Shin,et al.  Assignment and Scheduling Communicating Periodic Tasks in Distributed Real-Time Systems , 1997, IEEE Trans. Software Eng..

[53]  MA Andreas Hansson,et al.  A composable and predictable on-chip interconnect , 2009 .

[54]  Edward A. Lee,et al.  Code generation by using integer-controlled dataflow graph , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[55]  Shikharesh Majumdar,et al.  Scheduling in multiprogrammed parallel systems , 1988, SIGMETRICS 1988.

[56]  Guang R. Gao,et al.  Minimizing memory requirements in rate-optimal schedules , 1994, Proceedings of IEEE International Conference on Application Specific Array Processors (ASSAP'94).

[57]  Dror G. Feitelson,et al.  Job Scheduling in Multiprogrammed Parallel Systems , 1997 .

[58]  Scott A. Mahlke,et al.  From SODA to scotch: The evolution of a wireless baseband processor , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[59]  W. Heisenberg The Physical Principles of the Quantum Theory , 1930 .

[60]  Edward A. Lee,et al.  Scheduling dynamic dataflow graphs with bounded memory using the token flow model , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[61]  Guang R. Gao,et al.  A novel framework for multi-rate scheduling in DSP applications , 1993, Proceedings of International Conference on Application Specific Array Processors (ASAP '93).

[62]  Thomas A Henzinger Two challenges in embedded systems design: predictability and robustness , 2008, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[63]  Edward A. Lee,et al.  Hierarchical finite state machines with multiple concurrency models , 1999, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[64]  Thomas Martyn Parks,et al.  Bounded scheduling of process networks , 1996 .

[65]  Sander Stuijk,et al.  Power Minimisation for Real-Time Dataflow Applications , 2011, 2011 14th Euromicro Conference on Digital System Design.

[66]  Rolf Ernst,et al.  Performance analysis for complex embedded applications , 2005, Int. J. Embed. Syst..

[67]  Kees G. W. Goossens,et al.  CoMPSoC: A template for composable and predictable multi-processor system on chips , 2009, TODE.

[68]  Sander Stuijk,et al.  Latency Minimization for Synchronous Data Flow Graphs , 2007, 10th Euromicro Conference on Digital System Design Architectures, Methods and Tools (DSD 2007).

[69]  R. Nelson,et al.  A HISTORY-FRIENDLY MODEL OF THE CO-EVOLUTION OF THE COMPUTER AND SEMICONDUCTORS INDUSTRIES : CAPABILITIES AND TECHNICAL CHANGE AS DETERMINANTS OF THE VERTICAL SCOPE OF FIRMS IN RELATED INDUSTRIES , 2006 .

[70]  Andreia O. Hall,et al.  A note on the extremes of a particular moving average count data model , 2006 .

[71]  Scott A. Mahlke,et al.  Hierarchical coarse-grained stream compilation for software defined radio , 2007, CASES '07.

[72]  Jean A. Peperstraete,et al.  Cycle-static dataflow , 1996, IEEE Trans. Signal Process..

[73]  Orlando Moreira,et al.  Compiling Applications for ConCISe: An Example of Automatic HW/SW Partitioning and Synthesis , 2000, FPL.

[74]  Orlando Moreira,et al.  Self-Timed Scheduling Analysis for Real-Time Applications , 2007, EURASIP J. Adv. Signal Process..

[75]  J. T. Buck Static scheduling and code generation from dynamic dataflow graphs with integer-valued control streams , 1994, Proceedings of 1994 28th Asilomar Conference on Signals, Systems and Computers.

[76]  Prashant Pandey,et al.  Cloud computing , 2010, ICWET.

[77]  Orlando Moreira,et al.  Online resource management in a multiprocessor with a network-on-chip , 2007, SAC '07.

[78]  H. Vincent Poor,et al.  Software Radio , 1999, IEEE Personal Communications.

[79]  Kees G. W. Goossens,et al.  Guaranteeing the Quality of Services in Networks on Chip , 2003, Networks on Chip.

[80]  Sander Stuijk,et al.  Buffer Sizing for Rate-Optimal Single-Rate Data-Flow Scheduling Revisited , 2010, IEEE Transactions on Computers.

[81]  Daniel P. Siewiorek,et al.  Modeling multicomputer task allocation as a vector packing problem , 1996, Proceedings of 9th International Symposium on Systems Synthesis.

[82]  Ajm Arno Moonen,et al.  Timing analysis model for network based multiprocessor systems. , 2004 .

[83]  Sander Stuijk,et al.  Dataflow Analysis for Real-Time Embedded Multiprocessor System Design , 2005 .

[84]  S. Winter,et al.  Vertical Integration and Dis-integration of Computer Firms: A History Friendly Model of the Co-evolution of the Computer and Semiconductor Industries , 2008 .

[85]  Gilles Kahn,et al.  The Semantics of a Simple Language for Parallel Programming , 1974, IFIP Congress.

[87]  Orlando Moreira,et al.  Predictable Embedded Multiprocessor System Design , 2004, SCOPES.

[88]  Edward A. Lee The problem with threads , 2006, Computer.

[89]  Ulrich Ramacher Software-Defined Radio Prospects for Multistandard Mobile Phones , 2007, Computer.

[90]  Sander Stuijk,et al.  A scenario-aware data flow model for combined long-run average and worst-case performance analysis , 2006, Fourth ACM and IEEE International Conference on Formal Methods and Models for Co-Design, 2006. MEMOCODE '06. Proceedings..