Implementation-Aware Model Analysis: The Case of Buffer-Throughput Tradeoff in Streaming Applications

Models of computation abstract away a number of implementation details in favor of well-defined semantics. While this has unquestionable benefits, we argue that analysis of models solely based on operational semantics (implementation-oblivious analysis) is unfit to drive implementation design space exploration. Specifically, we study the tradeoff between buffer size and streaming throughput in applications modeled as synchronous data flow (SDF) graphs. We demonstrate the inherent inaccuracy of implementation-oblivious approach, which only considers SDF operational semantic. We propose a rigorous transformation, which equips the state of the art buffer-throughput tradeoff analysis technique with implementation awareness. Extensive empirical evaluation show that our approach results in significantly more accurate estimates in streaming throughput at the model level, while running two orders of magnitude faster than cycle-accurate simulation of implementations.

[1]  Orlando Moreira,et al.  Self-Timed Scheduling Analysis for Real-Time Applications , 2007, EURASIP J. Adv. Signal Process..

[2]  Edward A. Lee,et al.  Software Synthesis from Dataflow Graphs , 1996 .

[3]  Keshab K. Parhi,et al.  VLSI digital signal processing systems , 1999 .

[4]  Todor Stefanov,et al.  On the hard-real-time scheduling of embedded streaming applications , 2012, Design Automation for Embedded Systems.

[5]  Luca Benini,et al.  SIM in G-1 k : A Thousand-Core Simulator running on GPGPUs , 2012 .

[6]  Soheil Ghiasi,et al.  Throughput-memory footprint trade-off in synthesis of streaming software on embedded multiprocessors , 2013, TECS.

[7]  Edward A. Lee,et al.  Static Scheduling of Synchronous Data Flow Programs for Digital Signal Processing , 1989, IEEE Transactions on Computers.

[8]  Soonhoi Ha,et al.  Fractional Rate Dataflow Model for Efficient Code Synthesis , 2004, J. VLSI Signal Process..

[9]  Edward A. Lee,et al.  A causality interface for deadlock analysis in dataflow , 2006, EMSOFT '06.

[10]  Alberto L. Sangiovanni-Vincentelli,et al.  Platform-Based Design and Software Design Methodology for Embedded Systems , 2001, IEEE Des. Test Comput..

[11]  Sander Stuijk,et al.  Throughput Analysis of Synchronous Data Flow Graphs , 2006, Sixth International Conference on Application of Concurrency to System Design (ACSD'06).

[12]  Rudy Lauwereins,et al.  Data memory minimisation for synchronous data flow graphs emulated on DSP-FPGA targets , 1997, DAC.

[13]  R. Passerone,et al.  System level design paradigms: Platform-based design and communication synthesis , 2004 .

[14]  Bevan M. Baas,et al.  A 1080p H.264/AVC Baseline Residual Encoder for a Fine-Grained Many-Core System , 2011, IEEE Transactions on Circuits and Systems for Video Technology.

[15]  Todor Stefanov,et al.  Hard-real-time scheduling of data-dependent tasks in embedded streaming applications , 2011, 2011 Proceedings of the Ninth ACM International Conference on Embedded Software (EMSOFT).

[16]  Sander Stuijk,et al.  Exploring trade-offs in buffer requirements and throughput constraints for synchronous dataflow graphs , 2006, 2006 43rd ACM/IEEE Design Automation Conference.

[17]  George Kurian,et al.  Graphite: A distributed parallel simulator for multicores , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[18]  Marco Bekooij,et al.  Practical and Accurate Throughput Analysis with the Cyclo Static Dataflow Model , 2007, 2007 15th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems.

[19]  E.A. Lee,et al.  Synchronous data flow , 1987, Proceedings of the IEEE.

[20]  Soheil Ghiasi,et al.  FORMLESS: scalable utilization of embedded manycores in streaming applications , 2012, LCTES.

[21]  Luca Benini,et al.  SIMinG‐1k: A thousand‐core simulator running on general‐purpose graphical processing units , 2013, Concurr. Comput. Pract. Exp..

[22]  Soheil Ghiasi,et al.  Versatile Task Assignment for Heterogeneous Soft Dual-Processor Platforms , 2010, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[23]  David Wentzlaff,et al.  Processor: A 64-Core SoC with Mesh Interconnect , 2010 .

[24]  William Thies,et al.  StreamIt: A Language for Streaming Applications , 2002, CC.