Enhancing the Actual Throughput of the AES Algorithm on the Pascal GPU Architecture

The Advanced Encryption Standard (AES) is strongly used in different security levels of data communication as it has higher efficiency and stronger security compared with other encryption algorithms. Graphics Processing Unit (GPU) is one of the most important platforms used for enhancing AES algorithm Performance. Unfortunately, the AES actual throughput Over GPU can hardly improve Due to the CPU-GPU data transfer overhead. In this paper, the AES-ECB algorithm is implemented on NVIDIA GTX 1080 (Pascal architecture). We used Two different techniques to overcome data transfer overhead including the streaming technique and unified memory technique. Our results show that the actual throughput of the AES using the streaming technique equals 80Gbps which is about 2 times greater than using the unified memory technique. Furthermore, we achieved 280 Gbps Kernel throughput using 32bytes/thread granularity and shared memory key storage.

[1]  Giovanni Agosta,et al.  Design of a parallel AES for graphics hardware using the CUDA framework , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[2]  Xinxin Mei,et al.  Implementation and Analysis of AES Encryption on GPU , 2012, 2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems.

[3]  Mohamed M. Fouad,et al.  Analysis on the AES implementation with various granularities on different GPU architectures , 2017 .

[4]  Raphael C.-W. Phan,et al.  Fast implementation of block ciphers and PRNGs in Maxwell GPU architecture , 2016, Cluster Computing.

[5]  Luis C. E. Bona,et al.  Parallel speculative encryption of multiple AES contexts on GPUs , 2012 .

[6]  Mohamed M. Fouad,et al.  High performance CUDA AES implementation: A quantitative performance analysis approach , 2017, 2017 Computing Conference.

[7]  Mayez A. Al-Mouhamed,et al.  AES-128 ECB encryption on GPUs and effects of input plaintext patterns on performance , 2014, 15th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD).

[8]  Hai Jiang,et al.  CUDA-based AES parallelization with fine-tuned GPU memory utilization , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).

[9]  Ulrike Meyer,et al.  GPU-Acceleration of Block Ciphers in the OpenSSL Cryptographic Library , 2012, ISC.

[10]  Takakazu Kurokawa,et al.  AES Encryption Implementation on CUDA GPU and Its Analysis , 2010, 2010 First International Conference on Networking and Computing.

[11]  Takakazu Kurokawa,et al.  High-Performance Symmetric Block Ciphers on Multicore CPU and GPUs , 2012, Int. J. Netw. Comput..

[12]  Hidema Tanaka,et al.  Throughput and Power Efficiency Evaluations of Block Ciphers on Kepler and GCN GPUs , 2013, 2013 First International Symposium on Computing and Networking.

[13]  V.F. Kleist,et al.  The code book: the science of secrecy from ancient egypt to quantum cryptography [Book Review] , 2002, IEEE Annals of the History of Computing.

[14]  Deian Stefan,et al.  Fast Software AES Encryption , 2010, FSE.