Machine Learning Computation on Multiple GPU's using CUDA and Message Passing Interface

In this paper, we provide our efforts to implement machine learning modeling on commodity hardware such as general purpose graphical processing unit (GPU) and multiple GPU's connected with message passing interface (MPI). We consider risk models that involve a large number of iterations to come up with a probability of defaults for any credit account. This is computed based on the Markov Chain analysis. We discuss data structures and efficient implementation of machine learning models on the GPU platform. Idea is to leverage the power of fast GPU RAM and thousands of GPU core for fasten the execution process and reduce overall time. When we increase the number of GPU in our experiment, it also increases the programming complexity and increase the number of I/O which leads to increase overall turnaround time. We benchmarked the scalability and performance of our implementation with respect to size of the data. Performing model computations on huge amount o.f data is a compute intensive and costly task. We purpose four combinations of CPU, GPU and MPI for machine learning modeling. Experiment on real data show that to training machine leaning model on single GPU outperform as compare to CPu, Multiple GPU and GPU connected with MPI

[1]  Yao Zhang,et al.  Parallel Computing Experiences with CUDA , 2008, IEEE Micro.

[2]  Kevin Skadron,et al.  Scalable parallel programming , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).

[3]  Keke Gai,et al.  Data transfer minimization for financial derivative pricing using Monte Carlo simulation with GPU in 5G , 2016, Int. J. Commun. Syst..

[4]  Henry Wong,et al.  Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[5]  Khaled Benkrid,et al.  High-Performance Quasi-Monte Carlo Financial Simulation: FPGA vs. GPP vs. GPU , 2010, TRETS.

[6]  Kenneth A. Hawick,et al.  Comparison of GPU architectures for asynchronous communication with finite‐differencing applications , 2012, Concurr. Comput. Pract. Exp..