Transformation-Based Exploration of Data Parallel Architecture for Customizable Hardware: A JPEG Encoder Case Study

In this paper, we present a method for the design of MPSoCs for complex data-intensive applications. This method aims at a blend exploration of the communication, the memory system architecture and the computation resource parallelism. The proposed method is exemplified on a JPEG Encoder case study by describing all the design steps. Our method allows for a JPEG encoder implementation having a throughput increase of 84% and an increase of the achievable FPGA maximum frequency fmax of 64% with an area overhead of 6 with respect to a reference solution. Our method is also assessed with additional explorations of applications from different domains.

[1]  Gregory K. Wallace,et al.  The JPEG still picture compression standard , 1992 .

[2]  Edward A. Lee,et al.  Multidimensional synchronous dataflow , 2002, IEEE Trans. Signal Process..

[3]  Martin Lukasiewycz,et al.  Opt4J: a modular framework for meta-heuristic optimization , 2011, GECCO '11.

[4]  Scott A. Mahlke,et al.  PICO-NPA: High-Level Synthesis of Nonprogrammable Hardware Accelerators , 2002, J. VLSI Signal Process..

[5]  Daniel Gajski,et al.  A transformation-based method for loop folding , 1994, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[6]  Erik Brockmeyer,et al.  Data and memory optimization techniques for embedded systems , 2001, TODE.

[7]  Lech Józwiak,et al.  Design space exploration in application-specific hardware synthesis for multiple communicating nested loops , 2012, 2012 International Conference on Embedded Computer Systems (SAMOS).

[8]  Marco Laumanns,et al.  Performance assessment of multiobjective optimizers: an analysis and review , 2003, IEEE Trans. Evol. Comput..

[9]  Scott A. Mahlke,et al.  High-level synthesis of nonprogrammable hardware accelerators , 2000, Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors.

[10]  Nikil D. Dutt,et al.  Elimination of redundant memory traffic in high-level synthesis , 1996, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[11]  Luciano Lavagno,et al.  Metropolis: An Integrated Electronic System Design Environment , 2003, Computer.

[12]  Marc Pouzet,et al.  N-synchronous Kahn networks: a relaxed model of synchrony for real-time systems , 2006, POPL '06.

[13]  K WallaceGregory The JPEG still picture compression standard , 1991 .

[14]  Soonhoi Ha,et al.  A Systematic Design Space Exploration of MPSoC Based on Synchronous Data Flow Specification , 2010, J. Signal Process. Syst..

[15]  Viktor K. Prasanna,et al.  MILAN: A Model Based Integrated Simulation Framework for Design of Embedded Systems , 2001, OM '01.

[16]  Pierre Boulet,et al.  Repetitive model refactoring strategy for the design space exploration of intensive signal processing applications , 2011, J. Syst. Archit..

[17]  Pedro C. Diniz,et al.  Performance and area modeling of complete FPGA designs in the presence of loop transformations , 2003, 11th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, 2003. FCCM 2003..

[18]  Pierre G. Paulin,et al.  MPSoC memory optimization using program transformation , 2007, TODE.

[19]  Nikil D. Dutt,et al.  SPARK: a high-level synthesis framework for applying parallelizing compiler transformations , 2003, 16th International Conference on VLSI Design, 2003. Proceedings..

[20]  Abdoulaye Gamatié,et al.  Abstract Clocks for the DSE of Data-Intensive Applications on MPSoCs , 2012, 2012 IEEE 10th International Symposium on Parallel and Distributed Processing with Applications.

[21]  Ed F. Deprettere,et al.  A framework for rapid system-level exploration, synthesis, and programming of multimedia MP-SoCs , 2007, 2007 5th IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[22]  Francky Catthoor,et al.  Incremental hierarchical memory size estimation for steering of loop transformations , 2007, TODE.