Exploring performance tradeoffs for clustered VLIW ASIPs

VLIW ASIPs provide an attractive solution for increasingly pervasive real-time multimedia and signal processing embedded applications. In this paper we propose an algorithm to support trade-off exploration during the early phases of the design/specialization of VLIW ASIPs with clustered datapaths. For purposes of an early exploration step, we define a parameterized family of clustered datapaths D(m,n), where m and n denote interconnect capacity and cluster capacity constraints on the family. Given a kernel, the proposed algorithm explores the space of feasible clustered datapaths and returns: a datapath configuration; a binding and scheduling for the operations; and a corresponding estimate for the best achievable latency over the specified family. Moreover, we show how the parameters m and n, as well as a target latency optionally specified by the designer, can be used to effectively explore trade-offs among delay, power/energy, and latency. Extensive empirical evidence is provided showing that the proposed approach is strikingly effective at attacking this complex optimization problem.

[1]  Gustavo de Veciana,et al.  Lower bound on latency for VLIW ASIP datapaths , 1999, 1999 IEEE/ACM International Conference on Computer-Aided Design. Digest of Technical Papers (Cat. No.99CH37051).

[2]  Minjoong Rim,et al.  Lower-bound performance estimation for the high-level synthesis scheduling problem , 1994, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[3]  Elke A. Rundensteiner,et al.  Component synthesis from functional descriptions , 1993, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[4]  Giri Tiruvuri,et al.  Estimation of lower bounds in scheduling algorithms for high-level synthesis , 1998, TODE.

[5]  Gert Goossens,et al.  Code Generation for Embedded Processors , 1995 .

[6]  Werner Geurts Accelerator Data-Path Synthesis for High-Throughput Signal Processing Applications , 1996 .

[7]  William J. Dally,et al.  Register organization for media processing , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).

[8]  Jan M. Rabaey,et al.  Hardware selection and clustering in the HYPER synthesis system , 1992, [1992] Proceedings The European Conference on Design Automation.

[9]  William J. Dally,et al.  A bandwidth-efficient architecture for media processing , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.