Improving FPGA based SHA-3 structures

This work is focused on FPGA based implementations of the SHA-3 hash functions. The existing literature classifies the existing implementations according to the adopted structural optimization techniques, namely: folding, pipelining and unrolling. Several structures have been proposed in the state-of-the-art, which vary mainly in the level of folding and the number of pipeline stages. While unfolded structures allow obtaining higher throughputs, folded structures require less area resources at a cost of lower throughputs. It should be noted that due to the dependencies within the round caused by the step-mappings, the complexity increases as the folding technique is adopted. As suggested by the literature, the best results are achieved when using a slice-wise approach, rather than a lane-wise folding. With this approach, the resulting structure is able to process 16 slices on each iteration. However, special care must be taken regarding data dependencies in the θ and ρ step-mappings, in order to provide the necessary input values for the computation of the slices on each iteration. The ρ step-mapping dependencies were solved by re-scheduling the round computation as Rresc = θ ο ι ο χ ο π ο ρ. With this, it is possible to split the round computation into two parts, one computing θ and the other computing π,χ, and ι, with the ρ step-mapping embedded into the state memory. This approach, considering a tradeoff between performance and throughout, allows to mitigate the data dependency, thus allowing to improve the Throughput per Area efficiency regarding the existing state-of-the-art by up to 50%.