Parallelizing High-Level Synthesis: A Code Transformational Approach to High-Level Synthesis

Synthesis is the process of generating circuit implementations from descriptions of what a component does. Synthesis in part consists of refinement, elaboration as well as transformations and optimizations at multiple levels to generate circuits that compare favorably with manual designs. High-level synthesis (HLS) or behavioral synthesis specifically refers to circuit synthesis from algorithmic (or behavioral) descriptions. In this chapter, we focus on the developments in code transformation techniques for HLS over the past two decades, and describe recent progress in coordinated compiler and HLS transformations that seeks to combine effectively advances in parallelizing compiler techniques. We describe how coordinated compiler and HLS techniques can yield efficient circuits through examples of a class of source-level and dynamic transformations. We also describe recent developments in system-level modeling techniques and languages that attempt to raise the level of abstraction in the design process. 10-1 Gaurav Singh Virginia Tech Sumit Gupta Tensilica Inc. Sandeep Shukla Virginia Tech Rajesh Gupta UC, San Deigo CRC_7923_Ch010.qxd 11/15/2005 12:05 PM Page 1

[1]  S LamMonica,et al.  Limits of control flow on parallelism , 1992 .

[2]  Donald E. Thomas,et al.  Behavioral transformation for algorithmic level IC design , 1989, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[3]  L. C. Villar dos Santos,et al.  Exploiting instruction-level parallelism : a constructive approach , 1998 .

[4]  Giovanni De Micheli,et al.  High Level Synthesis of ASlCs un - der Timing and Synchronization Constraints , 1992 .

[5]  Nikil D. Dutt,et al.  SPARK: a high-level synthesis framework for applying parallelizing compiler transformations , 2003, 16th International Conference on VLSI Design, 2003. Proceedings..

[6]  Chong-Min Kyung,et al.  Fast and near optimal scheduling in automatic data path synthesis , 1991, 28th ACM/IEEE Design Automation Conference.

[7]  Alice C. Parker,et al.  Sehwa: a software package for synthesis of pipelines from behavioral specifications , 1988, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[8]  Bashir M. Al-Hashimi,et al.  Simultaneous scheduling, allocation and binding in high level synthesis ndwidth , 1997 .

[9]  Catherine H. Gebotys,et al.  Optimal synthesis of high-performance architectures , 1992 .

[10]  Rolf Ernst,et al.  Combining MBP-speculative computation and loop pipelining in high-level synthesis , 1995, Proceedings the European Design and Test Conference. ED&TC 1995.

[11]  Francky Catthoor,et al.  Analysis of high-level address code transformations for programmable processors , 2000, DATE '00.

[12]  Giovanni De Micheli,et al.  Hardware C - A Language for Hardware Design , 1988 .

[13]  Leon Stok,et al.  Module allocation and comparability graphs , 1991, 1991., IEEE International Sympoisum on Circuits and Systems.

[14]  Kazutoshi Wakabayashi,et al.  Global scheduling independent of control dependencies based on condition vectors , 1992, [1992] Proceedings 29th ACM/IEEE Design Automation Conference.

[15]  Peter Marwedel,et al.  OSCAR: optimum simultaneous scheduling, allocation and resource binding based on integer programming , 1994, EURO-DAC '94.

[16]  S. J. McFarland,et al.  The value trace : a data base for automated digital design , 1978 .

[17]  Reinaldo A. Bergamaschi,et al.  Behavioral network graph: unifying the domains of high-level and logic synthesis , 1999, DAC '99.

[18]  Pierre G. Paulin,et al.  High-level synthesis and codesign methods: An application to a Videophone Codec , 1995, Proceedings of EURO-DAC. European Design Automation Conference.

[19]  Daniel D. Gajski,et al.  High ― Level Synthesis: Introduction to Chip and System Design , 1992 .

[20]  Hong Ding,et al.  Structured Design Methodology for High-Level Design , 1994, 31st Design Automation Conference.

[21]  Donald E. Thomas,et al.  The VLSI Design Automation Assistant: What's in a Knowledge Base , 1985, 22nd ACM/IEEE Design Automation Conference.

[22]  Nikil D. Dutt,et al.  Speculation techniques for high level synthesis of control intensive designs , 2001, DAC '01.

[23]  Jian Li,et al.  HDL optimization using timed decision tables , 1996, 33rd Design Automation Conference Proceedings, 1996.

[24]  Scott A. Mahlke,et al.  PICO-NPA: High-Level Synthesis of Nonprogrammable Hardware Accelerators , 2002, J. VLSI Signal Process..

[25]  Hugo De Man,et al.  High-level address optimization and synthesis techniques for data-transfer-intensive applications , 1998, IEEE Trans. Very Large Scale Integr. Syst..

[26]  Alexandru Nicolau,et al.  Percolation based synthesis , 1991, DAC '90.

[27]  E. F. Girczyc,et al.  Loop winding--a data flow approach to functional pipelining , 1987 .

[28]  Hugo De Man,et al.  A specification invariant technique for operation cost minimisation in flow-graphs , 1994, Proceedings of 7th International Symposium on High-Level Synthesis.

[29]  Milind Girkar,et al.  Automatic Extraction of Functional Parallelism from Ordinary Programs , 1992, IEEE Trans. Parallel Distributed Syst..

[30]  James C. Hoe,et al.  Synthesis of operation-centric hardware descriptions , 2000, IEEE/ACM International Conference on Computer Aided Design. ICCAD - 2000. IEEE/ACM Digest of Technical Papers (Cat. No.00CH37140).

[31]  E. M. Girczyc,et al.  Automatic generation of microsequenced data paths to realize ada circuit descriptions , 1984 .

[32]  Monica S. Lam,et al.  Limits of control flow on parallelism , 1992, ISCA '92.

[33]  Jian Li,et al.  Decomposition of timed decision tables and its use in presynthesis optimizations , 1997, 1997 Proceedings of IEEE International Conference on Computer Aided Design (ICCAD).

[34]  Yu-Chin Hsu,et al.  A formal approach to the scheduling problem in high level synthesis , 1991, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[35]  Alice C. Parker,et al.  MAHA: A Program for Datapath Synthesis , 1986, 23rd ACM/IEEE Design Automation Conference.

[36]  Kemal Ebcioglu,et al.  A global resource-constrained parallelization technique , 1989 .

[37]  Rajesh Gupta,et al.  Dynamically increasing the scope of code motions during the high-level synthesis of digital circuits , 2003, IEE Proceedings - Computers and Digital Techniques.

[38]  Scott A. Mahlke,et al.  The superblock: An effective technique for VLIW and superscalar compilation , 1993, The Journal of Supercomputing.

[39]  Ahmed Amine Jerraya,et al.  Behavioral Synthesis and Component Reuse with VHDL , 1996 .

[40]  Raul Camposano,et al.  Path-based scheduling for synthesis , 1991, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[41]  Kewal K. Saluja,et al.  Incorporating performance and testability constraints during binding in high-level synthesis , 1996, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[42]  Nikil D. Dutt,et al.  Conditional speculation and its effects on performance and area for high-level synthesis , 2001, International Symposium on System Synthesis (IEEE Cat. No.01EX526).

[43]  Kazutoshi Wakabayashi,et al.  A resource sharing and control synthesis method for conditional branches , 1989, 1989 IEEE International Conference on Computer-Aided Design. Digest of Technical Papers.

[44]  Rajiv Gupta,et al.  Region Scheduling: An Approach for Detecting and Redistributing Parallelism , 1990, IEEE Trans. Software Eng..

[45]  Román Hermida,et al.  Maximizing conditional reuse by pre-synthesis transformations , 2002, Proceedings 2002 Design, Automation and Test in Europe Conference and Exhibition.

[46]  Apostolos A. Kountouris,et al.  High level pre-synthesis optimization steps using hierarchical conditional dependency graphs , 1999, Proceedings 25th EUROMICRO Conference. Informatics: Theory and Practice for the New Millennium.

[47]  Alexandru Nicolau A development environment for scientific parallel programs , 1986 .

[48]  Howard Trickey,et al.  Flamel: A High-Level Hardware Compiler , 1987, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[49]  Alexandru Nicolau,et al.  Uniform Parallelism Exploitation in Ordinary Programs , 1985, ICPP.

[50]  P. Six,et al.  Cathedral-II: A Silicon Compiler for Digital Signal Processing , 1986, IEEE Design & Test of Computers.

[51]  Forrest Brewer,et al.  Automata-based symbolic scheduling , 2000 .

[52]  Alexander Aiken,et al.  A Development Environment for Horizontal Microcode , 1986, IEEE Trans. Software Eng..

[53]  Bertrand Zavidovique,et al.  Towards a global solution to high level synthesis problems , 1990, Proceedings of the European Design Automation Conference, 1990., EDAC..

[54]  Daniel P. Siewiorek,et al.  Automated Synthesis of Data Paths in Digital Systems , 1986, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[55]  Forrest Brewer,et al.  A new symbolic technique for control-dependent scheduling , 1996, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[56]  Raymond Lo,et al.  Partial redundancy elimination in SSA form , 1999, TOPL.

[57]  Steven S. Muchnick,et al.  Advanced Compiler Design and Implementation , 1997 .

[58]  C. L. Liu,et al.  A scheduling algorithm for conditional resource sharing-a hierarchical reduction approach , 1994, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[59]  J. F. Wang,et al.  A Tree-Based Scheduling Algorithm for Control-Dominated Circuits , 1993, 30th ACM/IEEE Design Automation Conference.

[60]  Minjoong Rim,et al.  Global scheduling with code-motions for high-level synthesis applications , 1995, IEEE Trans. Very Large Scale Integr. Syst..

[61]  Donald E. Thomas,et al.  A Method of Automatic Data Path Synthesis , 1983, 20th Design Automation Conference Proceedings.

[62]  J. P. Veen,et al.  A method to control compensation code during global scheduling , 1997 .

[63]  Sumit Gupta,et al.  SPARK: A Parallelizing Approach to the High-Level Synthesis of Digital Circuits , 2004 .

[64]  Miodrag Potkonjak,et al.  Critical Path Minimization Using Retiming and Algebraic Speed-Up , 1993, 30th ACM/IEEE Design Automation Conference.

[65]  Giovanni De Micheli,et al.  Relative scheduling under timing constraints , 1991, DAC '90.

[66]  Pierre G. Paulin,et al.  Scheduling and Binding Algorithms for High-Level Synthesis , 1989, 26th ACM/IEEE Design Automation Conference.

[67]  Patrick Schaumont,et al.  A new algorithm for elimination of common subexpressions , 1999, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[68]  Wayne H. Wolf,et al.  The Princeton University behavioral synthesis system , 1992, [1992] Proceedings 29th ACM/IEEE Design Automation Conference.

[69]  Donald E. Thomas,et al.  The system architect's workbench , 1988, DAC '88.

[70]  Peter Marwedel A new synthesis for the MIMOLA software system , 1986, DAC.

[71]  Joseph A. Fisher,et al.  Trace Scheduling: A Technique for Global Microcode Compaction , 1981, IEEE Transactions on Computers.

[72]  David F. Bacon,et al.  Compiler transformations for high-performance computing , 1994, CSUR.

[73]  Richard I. Hartley,et al.  Tree-height minimization in pipelined architectures , 1989, 1989 IEEE International Conference on Computer-Aided Design. Digest of Technical Papers.

[74]  Edwin Hsing-Mean Sha,et al.  Rotation Scheduling: A Loop Pipelining Algorithm , 1993, 30th ACM/IEEE Design Automation Conference.

[75]  Wayne Wolf,et al.  High-Level VLSI Synthesis , 1991 .

[76]  Pierre G. Paulin,et al.  Force-directed scheduling for the behavioral synthesis of ASICs , 1989, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[77]  Giovanni De Micheli,et al.  Synthesis and Optimization of Digital Circuits , 1994 .

[78]  Niraj K. Jha,et al.  Incorporating speculative execution into scheduling of control-flow intensive behavioral descriptions , 1998, Proceedings 1998 Design and Automation Conference. 35th DAC. (Cat. No.98CH36175).

[79]  Kazutoshi Wakabayashi,et al.  C-based synthesis experiences with a behavior synthesizer, "Cyber" , 1999, Design, Automation and Test in Europe Conference and Exhibition, 1999. Proceedings (Cat. No. PR00078).

[80]  David W. Wall,et al.  Limits of instruction-level parallelism , 1991, ASPLOS IV.

[81]  Miodrag Potkonjak,et al.  Multiple constant multiplications: efficient and versatile framework and algorithms for exploring common subexpression elimination , 1996, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[82]  Nandini Mukherjee An ILP Solution for Optimum Scheduling, Module and Register Allocation, and Operation Binding in Datapath Synthesis , 1995 .

[83]  Alexandru Nicolau,et al.  Trailblazing: A Hierarchical Approach to Percolation Scheduling , 1993, 1993 International Conference on Parallel Processing - ICPP'93.

[84]  Donald A. Lobo,et al.  Redundant operator creation: a scheduling optimization technique , 1991, 28th ACM/IEEE Design Automation Conference.

[85]  Alexandru Nicolau,et al.  Mutation Scheduling: A Unified Approach to Compiling for Fine-Grain Parallelism , 1994, LCPC.

[86]  Miodrag Potkonjak,et al.  Maximally fast and arbitrarily fast implementation of linear computations , 1992, ICCAD '92.

[87]  Nikil D. Dutt,et al.  Coordinated parallelizing compiler optimizations and high-level synthesis , 2004, TODE.

[88]  Darin Petkov,et al.  Automatic generation of application specific processors , 2003, CASES '03.

[89]  Jochen A. G. Jess,et al.  A reordering technique for efficient code motion , 1999, DAC '99.

[90]  Alexandru Nicolau,et al.  Incremental tree height reduction for high level synthesis , 1991, 28th ACM/IEEE Design Automation Conference.

[91]  Louise Trevillyan,et al.  Control-flow versus data-flow-based scheduling: combining both approaches in an adaptive scheduling system , 1997, IEEE Trans. Very Large Scale Integr. Syst..

[92]  G. De Micheli,et al.  The Olympus Synthesis System for Digital Design , 1990 .