BRAM-LUT Tradeoff on a Polymorphic DES Design

A polymorphic implementation of the DES algorithm is presented. The polymorphic approach allows for a very fast integration of the DES hardware in existing software implementations, significantly reducing the time to marked and the development costs associated with hardware integration. The tradeoff between implementing the DES SBOXs in LUT or in BRAMs is the focus of the study presented in this paper. The FPGA implementation results suggest LUT reduction in the order of 100 slices (approximately 37%) for the full DES core, at the expense of 4 embedded memory blocks (BRAM). Even with this delay increase, the usage of BRAMs allows for an improvement of the Throughput per Slice ratio of 20%. The proposed computational structure has been implemented on a Xilinx VIRTEX II Pro (XC2VP30) prototyping device, requiring approximately 2% of the device resources. Experimental results, at an operating frequency of 100 MHz, suggest for the proposed polymorphic implementation a throughput of 400 Mbit/s for DES and 133 for 3DES. When compared with the software implementation of the DES algorithm, a speed up of 200 times can be archived for the kernel computation.