GPU exhibits the capability for applications with a high level of parallelism despite its low cost. The support of integer and logical instructions by the latest generation of GPUs enables us to implement cipher algorithms more easily. However, decisions such as parallel processing granularity and memory allocation impose a heavy burden on programmers. Therefore, this paper presents results of several experiments that were conducted to elucidate the relation between memory allocation styles of variables of AES and granularity as the parallelism exploited from AES encoding processes using CUDA with an NVIDIA GeForce GTX285 (Nvidia Corp.). Results of these experiments showed that the 16 bytes/thread granularity had the highest performance. It achieved approximately 35 Gbps throughput. It also exhibited differences of memory allocation and granularity effects around 2%–30% for performance in standard implementation. It shows that the decision of granularity and memory allocation is the most important factor for effective processing in AES encryption on GPU. Moreover, implementation with overlapping between processing and data transfer yielded 22.5 Gbps throughput including the data transfer time.
[1]
Kevin Skadron,et al.
A performance study of general-purpose applications on graphics processors using CUDA
,
2008,
J. Parallel Distributed Comput..
[2]
S.A. Manavski,et al.
CUDA Compatible GPU as an Efficient Hardware Accelerator for AES Cryptography
,
2007,
2007 IEEE International Conference on Signal Processing and Communications.
[3]
James Demmel,et al.
Benchmarking GPUs to tune dense linear algebra
,
2008,
HiPC 2008.
[4]
Giovanni Agosta,et al.
Design of a parallel AES for graphics hardware using the CUDA framework
,
2009,
2009 IEEE International Symposium on Parallel & Distributed Processing.
[5]
Iwai Keisuke,et al.
Granularity Optimization Method for AES Encryption Implementation on CUDA
,
2010
.
[6]
John Waldron,et al.
AES Encryption Implementation and Analysis on Commodity Graphics Processing Units
,
2007,
CHES.
[7]
Takakazu Kurokawa,et al.
Acceleration of the key crack against cipher algorithm using CUDA
,
2009
.
[8]
Angelos D. Keromytis,et al.
CryptoGraphics: Secret Key Cryptography Using Graphics Cards
,
2005,
CT-RSA.