Synthesis of an application-specific soft multiprocessor system

The application-specific multiprocessor System-on-a-Chip is a promising design alternative because of its high degree of flexibility, short development time, and potentially high performance attributed to application-specific optimizations. However, designing an optimal application-specific multiprocessor system is still challenging because there are a number of important metrics, such as throughput, latency, and resource usage, that need to be explored and optimized. This paper addresses the problem of synthesizing the application-specific multiprocessor system to minimize latency and resource usage under the throughput constraint. We employ a novel framework for this problem, similar to that of technology mapping in the logic synthesis domain, and develop a set of efficient algorithms, including labeling, clustering and packing, for efficient generation of the multiprocessor architecture with application-specific optimized latency and resources. Specifically, the result of our algorithm is latency-optimal for directed acyclic task graphs. Application of our approach to the Motion JPEG example on Xilinx's Virtex II Pro platform FPGA shows interesting design tradeoffs.

[1]  Shuvra S. Bhattacharyya,et al.  Embedded Multiprocessors: Scheduling and Synchronization , 2000 .

[2]  Michael Wolfe The definition of dependence distance , 1994, TOPL.

[3]  Edward A. Lee,et al.  Dataflow process networks , 2001 .

[4]  E.A. Lee,et al.  Synchronous data flow , 1987, Proceedings of the IEEE.

[5]  Wayne H. Wolf,et al.  TGFF: task graphs for free , 1998, Proceedings of the Sixth International Workshop on Hardware/Software Codesign. (CODES/CASHE'98).

[6]  Eugene L. Lawler,et al.  Module Clustering to Minimize Delay in Digital Networks , 1969, IEEE Transactions on Computers.

[7]  Niraj K. Jha,et al.  MOGAC: a multiobjective genetic algorithm for hardware-software cosynthesis of distributed embedded systems , 1998, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[8]  Hesham H. Ali,et al.  Task scheduling in parallel and distributed systems , 1994, Prentice Hall series in innovative technology.

[9]  Jan M. Rabaey,et al.  Scheduling of DSP programs onto multiprocessors for maximum throughput , 1993, IEEE Trans. Signal Process..

[10]  Kurt Keutzer,et al.  An automated exploration framework for FPGA-based soft multiprocessor systems , 2005, 2005 Third IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS'05).

[11]  Honching Li,et al.  Simultaneous circuit partitioning/clustering with retiming for performance optimization , 1999, Proceedings 1999 Design Automation Conference (Cat. No. 99CH36361).

[12]  Vivek Sarkar,et al.  Compile-time partitioning and scheduling of parallel programs , 1986, SIGPLAN '86.

[13]  David S. Johnson,et al.  Approximation algorithms for combinatorial problems , 1973, STOC.

[14]  Marilyn Wolf,et al.  An architectural co-synthesis algorithm for distributed, embedded computing systems , 1997, IEEE Trans. Very Large Scale Integr. Syst..

[15]  Kurt Keutzer,et al.  An FPGA-based soft multiprocessor system for IPv4 packet forwarding , 2005, International Conference on Field Programmable Logic and Applications, 2005..

[16]  Alice C. Parker,et al.  SOS: Synthesis of application-specific heterogeneous multiprocessor systems , 2001, J. Parallel Distributed Comput..

[17]  Alice C. Parker,et al.  Synthesis of Application-Specific Heterogeneous Multiprocessor Systems , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.

[18]  M. Grajcar,et al.  Genetic list scheduling algorithm for scheduling and allocation on a loosely coupled heterogeneous multiprocessor system , 1999, Proceedings 1999 Design Automation Conference (Cat. No. 99CH36361).

[19]  Henk Corporaal,et al.  Design of heterogenous multi-processor embedded systems: applying functional pipelining , 1997, Proceedings 1997 International Conference on Parallel Architectures and Compilation Techniques.

[20]  Rajmohan Rajaraman,et al.  Optimal Clustering for Delay Minimization , 1993, 30th ACM/IEEE Design Automation Conference.