Combining module selection and replication for throughput-driven streaming programs

Streaming processing is widely adopted in many data-intensive applications in various domains. FPGAs are commonly used to realize these applications since they can exploit inherent data parallelism and pipelining in the applications to achieve a better performance. In this paper we investigate the design space exploration problem (DSE) when mapping streaming applications onto FPGAs. Previous works narrowly focus on using techniques like replication or module selection to meet the throughput target. We propose to combine these two techniques together to guide the design space exploration. A formal formulation and solution to this combined problem is presented in this paper. Our objective is to optimize the total area cost subject to the throughput constraint. In particular, we are able to handle the feedback loops in the streaming programs, which, to the best of our knowledge, has never been discussed in previous work. Our methodology is evaluated with high-level synthesis tools, and we demonstrate our workflow on a set of benchmarks that vary from module kernel design such as FFT to large designs such as an MPEG-4 decoder.

[1]  Weng-Fai Wong,et al.  A computing origami: Folding streams in FPGAs , 2009, 2009 46th ACM/IEEE Design Automation Conference.

[2]  Kristof Denolf,et al.  A scalable, multi-stream MPEG-4 video decoder for conferencing and surveillance applications , 2005, IEEE International Conference on Image Processing 2005.

[3]  Abhishek Udupa,et al.  Synergistic execution of stream programs on multicores with accelerators , 2009, LCTES '09.

[4]  Michael I. Gordon,et al.  Exploiting coarse-grained task, data, and pipeline parallelism in stream programs , 2006, ASPLOS XII.

[5]  Jason Cong,et al.  High-Level Synthesis for FPGAs: From Prototyping to Deployment , 2011, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[6]  Zhaohui Du,et al.  Data and computation transformations for Brook streaming applications on multiprocessors , 2006, International Symposium on Code Generation and Optimization (CGO'06).

[7]  Scott A. Mahlke,et al.  Flextream: Adaptive Compilation of Streaming Applications for Heterogeneous Architectures , 2009, 2009 18th International Conference on Parallel Architectures and Compilation Techniques.

[8]  Guang R. Gao,et al.  Rate-optimal schedule for multi-rate DSP computations , 1995, J. VLSI Signal Process..

[9]  William Thies,et al.  An empirical characterization of stream programs and its implications for language and compiler design , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[10]  Edward A. Lee,et al.  Static Scheduling of Synchronous Data Flow Programs for Digital Signal Processing , 1989, IEEE Transactions on Computers.

[11]  Stephen Dean Brown,et al.  Enhancements to FPGA design methodology using streaming , 2009, 2009 International Conference on Field Programmable Logic and Applications.

[12]  E.A. Lee,et al.  Synchronous data flow , 1987, Proceedings of the IEEE.

[13]  Scott A. Mahlke,et al.  MacroSS: macro-SIMDization of streaming applications , 2010, ASPLOS XV.

[14]  Stephen Neuendorffer,et al.  FPGA Pipeline Synthesis Design Exploration Using Module Selection and Resource Sharing , 2007, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[15]  Deming Chen,et al.  Optimal module and voltage assignment for low-power , 2005, Proceedings of the ASP-DAC 2005. Asia and South Pacific Design Automation Conference, 2005..

[16]  I. Ahmad,et al.  Integrated scheduling, allocation and module selection for design-space exploration in high-level synthesis , 1995 .

[17]  Scott A. Mahlke,et al.  Orchestrating the execution of stream programs on multicore platforms , 2008, PLDI '08.

[18]  Henry Hoffmann,et al.  A stream compiler for communication-exposed architectures , 2002, ASPLOS X.

[19]  Java Binding,et al.  GNU Linear Programming Kit , 2011 .

[20]  G. De Micheli,et al.  A module selection algorithm for high-level synthesis , 1991, 1991., IEEE International Sympoisum on Circuits and Systems.

[21]  Keshab K. Parhi,et al.  ILP-based cost-optimal DSP synthesis with module selection and data format conversion , 1998, IEEE Trans. Very Large Scale Integr. Syst..