Design space exploration in application-specific hardware synthesis for multiple communicating nested loops

Application specific MPSoCs are often used to implement high-performance data-intensive applications. MPSoC design requires a rapid and efficient exploration of the hardware architecture possibilities to adequately orchestrate the data distribution and architecture of parallel MPSoC computing resources. Behavioral specifications of data-intensive applications are usually given in the form of a loop-based sequential code, which requires parallelization and task scheduling for an efficient MPSoC implementation. Existing approaches in application specific hardware synthesis, use loop transformations to efficiently parallelize single nested loops and use Synchronous Data Flows to statically schedule and balance the data production and consumption of multiple communicating loops. This creates a separation between data and task parallelism analyses, which can reduce the possibilities for throughput optimization in high-performance data-intensive applications. This paper proposes a method for a concurrent exploration of data and task parallelism when using loop transformations to optimize data transfer and storage mechanisms for both single and multiple communicating nested loops. This method provides orchestrated application specific decisions on communication architecture, memory hierarchy and computing resource parallelism. It is computationally efficient and produces high-performance architectures.

[1]  W. Dally,et al.  Route packets, not wires: on-chip interconnection networks , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[2]  Edward A. Lee,et al.  Multidimensional synchronous dataflow , 2002, IEEE Trans. Signal Process..

[3]  Pierre G. Paulin,et al.  MPSoC memory optimization using program transformation , 2007, TODE.

[4]  Pierre Boulet,et al.  Architecture Exploration for Efficient Data Transfer and Storage in Data-Parallel Applications , 2010, Euro-Par.

[5]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[6]  Jean-Luc Dekeyser,et al.  A Model-Driven Design Framework for Massively Parallel Embedded Systems , 2011, TECS.

[7]  Lech Józwiak,et al.  Transformation-Based Exploration of Data Parallel Architecture for Customizable Hardware: A JPEG Encoder Case Study , 2012, 2012 15th Euromicro Conference on Digital System Design.

[8]  Jean-Luc Dekeyser,et al.  Estimating Energy Consumption for an MPSoC Architectural Exploration , 2006, ARCS.

[9]  Martin Lukasiewycz,et al.  Combined system synthesis and communication architecture exploration for MPSoCs , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.

[10]  Pierre Boulet,et al.  Visual data-parallel programming for signal processing applications , 2001, Proceedings Ninth Euromicro Workshop on Parallel and Distributed Processing.

[11]  Marco Laumanns,et al.  Performance assessment of multiobjective optimizers: an analysis and review , 2003, IEEE Trans. Evol. Comput..

[12]  Marc Pouzet,et al.  N-synchronous Kahn networks: a relaxed model of synchrony for real-time systems , 2006, POPL '06.

[13]  Nikil D. Dutt,et al.  Elimination of redundant memory traffic in high-level synthesis , 1996, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[14]  Pierre Boulet,et al.  Design space exploration for efficient data intensive computing on SoCs , 2011 .

[15]  Daniel Gajski,et al.  A transformation-based method for loop folding , 1994, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[16]  Abdoulaye Gamatié,et al.  Abstract Clocks for the DSE of Data-Intensive Applications on MPSoCs , 2012, 2012 IEEE 10th International Symposium on Parallel and Distributed Processing with Applications.

[17]  Soonhoi Ha,et al.  A Systematic Design Space Exploration of MPSoC Based on Synchronous Data Flow Specification , 2010, J. Signal Process. Syst..

[18]  Luca Benini,et al.  Networks on Chips : A New SoC Paradigm , 2022 .

[19]  Pierre Boulet,et al.  Repetitive model refactoring strategy for the design space exploration of intensive signal processing applications , 2011, J. Syst. Archit..

[20]  Sarvapali D. Ramchurn,et al.  An Anytime Algorithm for Optimal Coalition Structure Generation , 2014, J. Artif. Intell. Res..

[21]  Pedro C. Diniz,et al.  Performance and area modeling of complete FPGA designs in the presence of loop transformations , 2003, 11th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, 2003. FCCM 2003..

[22]  Twan Basten,et al.  Model-Driven Design-Space Exploration for Embedded Systems: The Octopus Toolset , 2010, ISoLA.

[23]  Sergio Bampi,et al.  A FPGA based design of a multiplierless and fully pipelined JPEG compressor , 2005, 8th Euromicro Conference on Digital System Design (DSD'05).

[24]  Martin Lukasiewycz,et al.  Opt4J: a modular framework for meta-heuristic optimization , 2011, GECCO '11.

[25]  Nikil D. Dutt,et al.  SPARK: a high-level synthesis framework for applying parallelizing compiler transformations , 2003, 16th International Conference on VLSI Design, 2003. Proceedings..

[26]  Erik Brockmeyer,et al.  Data and memory optimization techniques for embedded systems , 2001, TODE.

[27]  Francky Catthoor,et al.  Incremental hierarchical memory size estimation for steering of loop transformations , 2007, TODE.