Towards a Domain Specific Solution for a New Generation of Wireless Modems

Wireless cellular System on Chip (SoC) are experiencing unprecedented demands on data rate, latency use case variety. 5G wireless technologies require a massive number of antennas and complex signal processing to improve bandwidth and spectral efficiency. The Internet of Things is causing a proliferation in the number of connected devices, and service categories, such as ultra-reliable low latency, which will produce new use cases, such as self-driving cars, robotic factories, and remote surgery. In addressing these challenges, we can no longer rely on faster cores, or even more silicon. Modem software development is becoming increasingly error prone and difficult as the complexity of the applications and the architectures increase. In this report we propose a Wireless Domain Specific Solution that takes a Dataflow acceleration approach and addresses the need of the SoC to support dataflows that change with use case and user activity, while maintaining the Firm Real Time High Availability with low probability of Heisenbugs that is required in cellular modems. We do this by developing a Domain Specific Architecture that describes the requirements in a suitably abstracted dataflow Domain Specific language. A toolchain is described that automates translation of those requirements in an efficient and robust manner and provides formal guarantees against Heisenbugs. The dataflow native DSA supports the toolchain output with specialized processing, data management and control features with high performance and low power, and recovers rapidly from dropped dataflows while continuing to achieve the real time requirements. This report focuses on the dataflow acceleration in the DSA and the part of the automated toolchain that formally checks the performance and correctness of software running on this dataflow hardware. Results are presented and a summary of future work is given.

[1]  Lothar Thiele,et al.  Windowed FIFOs for FPGA-based Multiprocessor Systems , 2007, 2007 IEEE International Conf. on Application-specific Systems, Architectures and Processors (ASAP).

[2]  Kim G. Larsen,et al.  A Tutorial on Uppaal , 2004, SFM.

[3]  David E. Culler,et al.  Dataflow architectures , 1986 .

[4]  Sander Stuijk,et al.  Scenario-aware dataflow: Modeling, analysis and implementation of dynamic applications , 2011, 2011 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation.

[5]  Maxime Pelcat,et al.  Preesm: A dataflow-based rapid prototyping framework for simplifying multicore DSP programming , 2014, 2014 6th European Embedded Design in Education and Research Conference (EDERC).

[6]  Gu-Yeon Wei,et al.  Benchmarking TPU, GPU, and CPU Platforms for Deep Learning , 2019, ArXiv.

[7]  Alfons Laarman,et al.  Multi-core Reachability for Timed Automata , 2012, FORMATS.

[8]  E.A. Lee,et al.  Synchronous data flow , 1987, Proceedings of the IEEE.

[9]  Michael R. Macedonia,et al.  The GPU Enters Computing's Mainstream , 2003, Computer.

[10]  James H. Clark,et al.  The Geometry Engine , 1982, SIGGRAPH.

[11]  Ernesto Wandeler,et al.  Modular performance analysis and interface based design for embedded real time systems , 2006 .

[12]  Edward A. Lee,et al.  A PRET architecture supporting concurrent programs with composable timing properties , 2010, 2010 Conference Record of the Forty Fourth Asilomar Conference on Signals, Systems and Computers.

[13]  David A. Patterson,et al.  A new golden age for computer architecture , 2019, Commun. ACM.

[14]  Kai Huang,et al.  A Software Component for Network Based Data Acquisition and Control Applications , 2007 .

[15]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[16]  Jan Reineke,et al.  Towards compositionality in execution time analysis: definition and challenges , 2015, SIGBED.

[17]  Abusayeed Saifullah,et al.  Energy-Efficient Real-Time Scheduling of DAGs on Clustered Multi-Core Platforms , 2019, 2019 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS).

[18]  Jean-François Nezan,et al.  Adaptive multicore scheduling for the LTE uplink , 2010, 2010 NASA/ESA Conference on Adaptive Hardware and Systems.

[19]  David A. Patterson,et al.  Motivation for and Evaluation of the First Tensor Processing Unit , 2018, IEEE Micro.

[20]  Jean-François Nezan,et al.  PiMM: Parameterized and Interfaced dataflow Meta-Model for MPSoCs runtime reconfiguration , 2013, 2013 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS).

[21]  Erik Lindholm,et al.  NVIDIA Tesla: A Unified Graphics and Computing Architecture , 2008, IEEE Micro.

[22]  Alan Gatherer,et al.  Combinatorics and Geometry for the Many-ported, Distributed and Shared Memory Architecture , 2020, 2020 14th IEEE/ACM International Symposium on Networks-on-Chip (NOCS).

[23]  Kim G. Larsen,et al.  Schedulability Analysis Using Uppaal: Herschel-Planck Case Study , 2010, ISoLA.

[24]  Kim G. Larsen,et al.  A Tutorial on Uppaal 4 . 0 , 2006 .

[25]  Erik G. Larsson,et al.  Massive MIMO for next generation wireless systems , 2013, IEEE Communications Magazine.

[26]  Brian K. Classon,et al.  5G System Design , 2019 .

[27]  Frank S. de Boer,et al.  Schedulability and Compatibility of Real Time Asynchronous Objects , 2008, 2008 Real-Time Systems Symposium.

[28]  David Broman,et al.  A PRET microarchitecture implementation with repeatable timing and competitive performance , 2012, 2012 IEEE 30th International Conference on Computer Design (ICCD).

[29]  J. P. Grossman,et al.  Anton 2: Raising the Bar for Performance and Programmability in a Special-Purpose Molecular Dynamics Supercomputer , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[30]  Lothar Thiele,et al.  Analytic real-time analysis and timed automata: a hybrid methodology for the performance analysis of embedded real-time systems , 2010, Des. Autom. Embed. Syst..

[31]  Keqin Li,et al.  Mixed real-time scheduling of multiple DAGs-based applications on heterogeneous multi-core processors , 2016, Microprocess. Microsystems.

[32]  Edward A. Lee,et al.  Dataflow process networks , 1995, Proc. IEEE.

[33]  Vivienne Sze,et al.  Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices , 2018, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.

[34]  Kentaro Sano,et al.  A Survey on Coarse-Grained Reconfigurable Architectures From a Performance Perspective , 2020, IEEE Access.

[35]  Naga K. Govindaraju,et al.  High performance discrete Fourier transforms on graphics processors , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.