A high-performance data path for synthesizing DSP kernels

A high-performance data path to implement digital signal processing (DSP) kernels is introduced in this paper. The data path is realized by a flexible computational component (FCC), which is a pure combinational circuit and it can implement any 2 times 2 template (cluster) of primitive resources. Thus, the data path's performance benefits from the intracomponent chaining of operations. Due to the flexible structure of the FCC, the data path is implemented by a small number of such components. This allows for direct connections among FCCs and for exploiting intercomponent chaining, which further improves performance. Due to the universality and flexibility of the FCC, simple and efficient algorithms perform scheduling and binding of the data flow graph (DFG). DSP benchmarks synthesized with the FCC data path method show significant performance improvements when compared with template-based data path designs. Detailed results on execution time, FCC utilization, and area are presented

[1]  Miodrag Potkonjak,et al.  Performance optimization using template mapping for datapath-intensive high-level synthesis , 1996, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[2]  Scott A. Mahlke,et al.  Automatic Design of Application Specific Instruction Set Extensions Through Dataflow Graph Exploration , 2004, International Journal of Parallel Programming.

[3]  Majid Sarrafzadeh,et al.  Instruction generation for hybrid reconfigurable systems , 2001, IEEE/ACM International Conference on Computer Aided Design. ICCAD 2001. IEEE/ACM Digest of Technical Papers (Cat. No.01CH37281).

[4]  Peter Marwedel,et al.  Built-in chaining: introducing complex components into architectural synthesis , 1997, Proceedings of ASP-DAC '97: Asia and South Pacific Design Automation Conference.

[5]  Teresa H. Y. Meng,et al.  Design and implementation of an all-CMOS 802.11a wireless LAN chipset , 2003, IEEE Communications Magazine.

[6]  Giovanni De Micheli,et al.  Synthesis and Optimization of Digital Circuits , 1994 .

[7]  Paolo Ienne,et al.  Automatic application-specific instruction-set extensions under microarchitectural constraints , 2003, Proceedings 2003. Design Automation Conference (IEEE Cat. No.03CH37451).

[8]  Srivaths Ravi,et al.  Synthesis of custom processors based on extensible platforms , 2002, ICCAD 2002.

[9]  Jason Cong,et al.  Application-specific instruction generation for configurable processor architectures , 2004, FPGA '04.

[10]  Werner Geurts Accelerator Data-Path Synthesis for High-Throughput Signal Processing Applications , 1996 .

[11]  Miodrag Potkonjak,et al.  MediaBench: a tool for evaluating and synthesizing multimedia and communications systems , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[12]  Yen-Tai Lai,et al.  Hierarchical interconnection structures for field programmable gate arrays , 1997, IEEE Trans. Very Large Scale Integr. Syst..

[13]  Charles E. Leiserson,et al.  Fat-trees: Universal networks for hardware-efficient supercomputing , 1985, IEEE Transactions on Computers.

[14]  Majid Sarrafzadeh,et al.  Instruction generation for hybrid reconfigurable systems , 2001, ICCAD 2001.

[15]  Konstantinos Konstantinides,et al.  Image and video compression standards , 1995 .

[16]  Sri Parameswaran,et al.  INSIDE: INstruction Selection/Identification & Design Exploration for extensible processors , 2003, ICCAD-2003. International Conference on Computer Aided Design (IEEE Cat. No.03CH37486).

[17]  Paul E. Harvey,et al.  POWER2 floating-point unit: Architecture and implementation , 1994, IBM J. Res. Dev..

[18]  Giovanni De Micheli,et al.  Automatic instruction set extension and utilization for embedded processors , 2003, Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003.

[19]  Stamatis Vassiliadis,et al.  Interlock Collapsing ALU's , 1993, IEEE Trans. Computers.