Vectorized AES Core for High-throughput Secure Environments

Parallelism has long been used to increase the throughput of applications that process independent data. With the advent of multicore technology designers and programmers are increasingly forced to think in parallel. In this paper we present the evaluation of an encryption core capable of handling multiple data streams. The design is oriented towards future scenarios for internet, where throughput capacity requirements together with privacy and integrity will be critical for both personal and corporate users. To power such scenarios we present a technique that increases the efficiency of memory bandwidth utilization of cryptographic cores. We propose to feed cryptographic engines with multiple streams to better exploit the available bandwidth. To validate our claims, we have developed an AES core capable of encrypting two streams in parallel using either ECB or CBC modes. Our AES core implementation consumes trivial amount of resources when a Virtex-II Pro FPGA device is targeted.

[1]  Mateo Valero,et al.  Adding a vector unit to a superscalar processor , 1999, ICS '99.

[2]  Yong Dou,et al.  64-bit floating-point FPGA matrix multiplication , 2005, FPGA '05.

[3]  Mateo Valero,et al.  A victim cache for vector registers , 1997, ICS '97.

[4]  Mateo Valero,et al.  Out-of-order vector architectures , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[5]  Stamatis Vassiliadis,et al.  The CSI multimedia architecture , 2005, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[6]  Mateo Valero,et al.  DLP+TLP processors for the next generation of media workloads , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[7]  Matthew Mattina,et al.  Tarantula: a vector extension to the alpha architecture , 2002, Proceedings 29th Annual International Symposium on Computer Architecture.

[8]  Mateo Valero,et al.  Exploiting a new level of DLP in multimedia applications , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[9]  Arjen K. Lenstra,et al.  Massively parallel elliptic curve factoring , 1993, CRYPTO 1993.

[10]  Sorin Cotofana Direct and Transposed Sparse Matrix-Vector Multiplication , 2002 .

[11]  Mateo Valero,et al.  Exploiting instruction- and data-level parallelism , 1997, IEEE Micro.

[12]  Stamatis Vassiliadis,et al.  SAD Prefetching for MPEG4 Using Flux Caches , 2006, SAMOS.

[13]  Stamatis Vassiliadis,et al.  The MOLEN rho-mu-Coded Processor , 2001, FPL.

[14]  Stamatis Vassiliadis,et al.  A Hierarchical sparse matrix storage format for vector processors , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[15]  Mateo Valero,et al.  Decoupled vector architectures , 1996, Proceedings. Second International Symposium on High-Performance Computer Architecture.

[16]  Mateo Valero,et al.  Three-dimensional memory vectorization for high bandwidth media memory systems , 2002, MICRO.

[17]  Vijay Kumar,et al.  Efficient galois field arithmetic on SIMD architectures , 2003, SPAA '03.

[18]  M. Suzuoki,et al.  Overview of the architecture, circuit design, and physical implementation of a first-generation cell processor , 2006, IEEE Journal of Solid-State Circuits.

[19]  Eduard Ayguadé,et al.  Increasing the number of strides for conflict-free vector access , 1992, ISCA '92.

[20]  Stamatis Vassiliadis,et al.  Reconfigurable memory based AES co-processor , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[21]  Stamatis Vassiliadis,et al.  Visual Data Rectangular Memory , 2004, Euro-Par.

[22]  Stamatis Vassiliadis,et al.  Sparse Matrix Vector Multiplication Evaluation Using the BBCS Scheme , 2001 .

[23]  Stamatis Vassiliadis,et al.  Block Based Compression Storage Expected Performance , 2002 .

[24]  Mateo Valero,et al.  Command vector memory systems: high performance at low cost , 1998, Proceedings. 1998 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.98EX192).

[25]  Mateo Valero,et al.  Three-dimensional memory vectorization for high bandwidth media memory systems , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..

[26]  Stamatis Vassiliadis,et al.  The MOLEN polymorphic processor , 2004, IEEE Transactions on Computers.

[27]  Stamatis Vassiliadis,et al.  Flux Caches: What Are They and Are They Useful? , 2005, SAMOS.

[28]  Stamatis Vassiliadis,et al.  BBCS Based Sparse Matrix-Vector Multiplication: Initial Evaluation , 2000 .

[29]  Carlos Alvarez,et al.  Initial Results on Fuzzy Floating Point Computation for Multimedia Processors , 2002, IEEE Computer Architecture Letters.

[30]  Stamatis Vassiliadis,et al.  The MOLEN ρμ-coded processor , 2001 .

[31]  Nigel P. Smart,et al.  Parallel cryptographic arithmetic using a redundant Montgomery representation , 2004, IEEE Transactions on Computers.

[32]  Mateo Valero,et al.  Speculative dynamic vectorization , 2002, Proceedings 29th Annual International Symposium on Computer Architecture.

[33]  Eduard Ayguadé,et al.  Vector multiprocessors with arbitrated memory access , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[34]  Stamatis Vassiliadis,et al.  The MOLEN processor prototype , 2004, 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[35]  Stamatis Vassiliadis,et al.  D-SAB: A Sparse Matrix Benchmark Suite , 2003, PaCT.

[36]  Stamatis Vassiliadis,et al.  The Molen Programming Paradigm , 2004, SAMOS.

[37]  M. Valero,et al.  Design and implementation of high-performance memory systems for future packet buffers , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..