论文信息 - Scaling Computation on GPUs Using Powerlists

Scaling Computation on GPUs Using Powerlists

With the explosion of big data analytics, scaling linear algebra packages has become extremely important. Inthe context of GPUs, cuBLAS API provides a highly efficientpackage for linear algebra subroutines on a single GPU. Dueto inputs of large dimensions, it often becomes necessary tocompute over clusters. However, the package does not provide facilities for computing over a 'cluster of GPUs' efficiently. Inthis paper, we demonstrate a high level framework for scaling linear algebra computations across a cluster of GPUs, through matrix multiplication problem. In particular, we describe amethod of specifying matrices using powerlists that captures both parallelism and recursion succinctly, and automatically schedule partitioned matrices over a GPU cluster to gain the advantages of cuBLAS for computing the product of partitioned matrices over a cluster of GPUs. Our experimental results show significant performance gains, of the order ofat least 132% for large matrices over that of a single GPUcomputation. The method reflects the map-reduce paradigmwhere the matrices are mapped to appropriate partitioned matrices and sent to appropriate members of the clusters andthe results are collected to obtain the resultant matrix.

R. K. Shyamasundar | Anshu S. Anand | Anshu S. Anand | R. Shyamasundar

[1] Jayadev Misra,et al. Data structures for parallel recursion , 1997 .

[2] Yuefan Deng,et al. New trends in high performance computing , 2001, Parallel Computing.

[3] David S. Wise. Representing matrices as quadtrees for parallel processors: extended abstract , 1984, SIGS.

[4] Jayadev Misra,et al. Powerlist: a structure for parallel recursion , 1994, TOPL.

[5] Jacob Kornerup,et al. Mapping Powerlists onto Hypercubes , 2010 .

[6] George Ostrouchov,et al. Programming with Big Data – Scalable Linear Algebra Packages , 2016 .

[7] David A. Padua,et al. SPL: a language and compiler for DSP algorithms , 2001, PLDI '01.

[8] Robert A. van de Geijn,et al. PLAPACK: Parallel Linear Algebra Package , 1997, PPSC.

[9] Jack J. Dongarra,et al. Automated empirical optimizations of software and the ATLAS project , 2001, Parallel Comput..

[10] Steven G. Johnson,et al. The Design and Implementation of FFTW3 , 2005, Proceedings of the IEEE.

[11] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[12] Robert A. van de Geijn,et al. Using PLAPACK - parallel linear algebra package , 1997 .

[13] Jacob Kornerup. Mapping a Functional Notation for Parallel Programs Onto Hypercubes , 1995, Inf. Process. Lett..

[14] George Ostrouchov,et al. Programming with BIG Data in R: Scaling Analytics from One to Thousands of Nodes , 2017, Big Data Res..