Password recovery using MPI and CUDA

Using passwords to verify a user's identity is the most widely deployed method for electronic authentication. When system administrators need to recover lost passwords or test accounts for easily guessable passwords, it can require millions of hash function and string comparison operations. These operations can be computationally expensive but are easily parallelizable because each password can be tested independently. Therefore, using high performance computing (HPC) can greatly reduce the time required to perform password recovery. Due to the high level of fine-grained parallelism of this type of problem, GPU computing using Compute Unified Device Architecture (CUDA) can be used to further improve performance. The scale of HPC can be further increased through the use of multiple GPUs, but this requires communication between the GPU devices and can reduce the overall performance due to increased communications latency. In this work a well established HPC framework, Message Passing Interface (MPI), was used to minimize the amount of latency and handle the communication between the devices. This allowed for a course-grained division of the problem using MPI where each device applies a fine-grained division of the problem using CUDA to perform the actual calculations. This paper describes three dictionary-based password recovery algorithms that use both MPI and CUDA. In this approach the hashed values of known words are computed and compared with hash values of unknown user passwords. The algorithms differed in GPU memory utilization and how the data was divided and distributed among the MPI nodes and GPU devices. A divided dictionary algorithm split the dictionary of potential passwords over the G PUs and copied the password database to each GPU. A divided password database algorithm split the password database and copied the potential passwords. A minimal memory algorithm split the password database and sequentially processed individual passwords on the GPUs. The divided dictionary and the divided password database algorithms performed well, resulting in a speedup of 57x and 40x over a single processor using 8 GPUs across 4 compute nodes, respectively. Illustrating the cost of communication latency between MPI nodes and GPUs, the minimal memory algorithm performed significantly slower than a single CPU. The algorithms are shown to scale well to multiple GPUs, so this password recovery system could be used for much larger systems for larger databases. In addition to recovering lost passwords, this work could be used to help improve the security of computer systems by identifying accounts with weak or common passwords. The framework described may also be useful for other research that needs to process large amounts of data with similar characteristics using MPI and CUDA.

[1]  Jin-Fa Lee,et al.  An MPI/GPU parallelization of an interior penalty discontinuous Galerkin time domain method for Maxwell's equations , 2011 .

[2]  Simon Marechal Advances in password cracking , 2007, Journal in Computer Virology.

[3]  Stephen A. Jarvis,et al.  Performance analysis of a hybrid MPI/CUDA implementation of the NASLU benchmark , 2011, PERV.

[4]  Russell Edward Graves High performance password cracking by implementing rainbow tables on nVidia graphics cards (IseCrack) , 2008 .

[5]  Svetlana Peltsverger,et al.  The security of cryptographic hashes , 2011, ACM-SE '11.

[6]  Consolación Gil,et al.  Cryptanalysis of Hash Functions Using Advanced Multiprocessing , 2010, DCAI.

[7]  John R. Crumpacker Distributed Password Cracking , 2009 .

[8]  David P. Anderson,et al.  High-performance task distribution for volunteer computing , 2005, First International Conference on e-Science and Grid Computing (e-Science'05).

[9]  Sudhir Aggarwal,et al.  Password Cracking Using Probabilistic Context-Free Grammars , 2009, 2009 30th IEEE Symposium on Security and Privacy.

[10]  Massimo Bernaschi,et al.  An Architecture for Distributed Dictionary Attacks to Cryptosystems , 2009, J. Comput..

[11]  Pierre-Louis Cayrel,et al.  GPU Implementation of the Keccak Hash Function Family , 2011, ISA.

[12]  Yi Pan,et al.  Distributed MD4 Password Hashing with Grid Computing Package BOINC , 2004, GCC.

[13]  Chao-Tung Yang,et al.  Hybrid CUDA, OpenMP, and MPI parallel programming on multicore GPU clusters , 2011, Comput. Phys. Commun..

[14]  D. N. Ranasinghe,et al.  On the Comparative Performance of Parallel Algorithms on Small GPU / CUDA Clusters , 2009 .

[15]  Michael Lin,et al.  MPI Enhancements in John the Ripper , 2010 .

[16]  Pietro Michiardi,et al.  Password Strength: An Empirical Analysis , 2010, 2010 Proceedings IEEE INFOCOM.

[17]  Ian T. Foster,et al.  A Grid-Enabled MPI: Message Passing in Heterogeneous Distributed Computing Systems , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[18]  S. D. Hammond,et al.  Performance Analysis of a Hybrid MPI / CUDA Implementation of the NAS-LU Benchmark , 2010 .

[19]  José Miguel Mantas,et al.  An MPI-CUDA implementation of an improved Roe method for two-layer shallow water systems , 2012, J. Parallel Distributed Comput..