The effects of datapath placement and C-slow retiming on three computational benchmarks

Summary form only given. Two important optimizations within the FPGA design process, C-slow retiming and datapath placement, offer significant benefits for designers. Many have advocated and implemented tools to use these techniques in both automatic and semiautomatic manner but they have not made their way into conventional FPGA toolflows. C-slow retiming is a method of accelerating computations that include feedback loops. Instead of having a single instance of the computation, the feedback loop is pipelined so that C separate instances are all calculated simultaneously. This allows fine grained pipelining to occur even in designs that include feedback loops, such as single round cryptographic implementations or microprocessors. Done properly, it imposes a significant but not imposing latency penalty for single computations while offering huge increases in throughput. Datapath placement is simply constructing the design in a manner that accounts for the higher level data flows. This offers several benefits, including improved performance, more physically compact designs, shorter wires, and faster place and route times when the FPGA is heavily utilized. Even for designs with less structure which are amenable to simulated annealing, datapath placement may still offer a significant benefit. To clearly demonstrate the importance of these optimizations we have hand-modified three computational benchmarks which represent significant themes within FPGA computation: Rijndael/AES encryption, Smith/Waterman, and a simplified 32-bit microprocessor datapath. All three represent significantly different modes of computation within FPGAs, but all gain significantly from the use of these techniques.

[1]  Rainer Laur,et al.  On the VLSI implementation of the international data encryption algorithm IDEA , 1995, Proceedings of ISCAS'95 - International Symposium on Circuits and Systems.

[2]  Wolfgang Fichtner,et al.  A 177 Mb/s VLSI implementation of the International Data Encryption Algorithm , 1994 .

[3]  R. F. Lyon Cost, power, and parallelism in speech signal processing , 1993, Proceedings of IEEE Custom Integrated Circuits Conference - CICC '93.

[4]  Stephen Dean Brown,et al.  The case for registered routing switches in field programmable gate arrays , 2001, FPGA '01.

[5]  George Varghese,et al.  HSRA: high-speed, hierarchical synchronous reconfigurable array , 1999, FPGA '99.

[6]  Duncan A. Buell,et al.  Splash 2 - FPGAs in a custom computing machine , 1996 .

[7]  Charles E. Leiserson,et al.  Optimizing Synchronous Circuitry by Retiming (Preliminary Version) , 1983 .

[8]  Allan Porterfield,et al.  The Tera computer system , 1990 .

[9]  Kris Gaj,et al.  Fast implementations of secret-key block ciphers using mixed inner- and outer-round pipelining , 2001, FPGA '01.

[10]  Satnam Singh Death of the RLOC? , 2000, Proceedings 2000 IEEE Symposium on Field-Programmable Custom Computing Machines (Cat. No.PR00871).

[11]  John Wawrzynek,et al.  Fast module mapping and placement for datapaths in FPGAs , 1998, FPGA '98.

[12]  Richard J. Carter,et al.  Attacking the semantic gap between application programming languages and configurable hardware , 2001, FPGA '01.

[13]  Andreas Koch Structured Design Implementation - A Strategy for Implementing Regular Datapaths on FPGAs , 1996, Fourth International ACM Symposium on Field-Programmable Gate Arrays.