论文信息 - OpenMP-style parallelism in data-centered multicore computing with R

OpenMP-style parallelism in data-centered multicore computing with R

R1 is a domain specific language widely used for data analysis by the statistics community as well as by researchers in finance, biology, social sciences, and many other disciplines. As R programs are linked to input data, the exponential growth of available data makes high-performance computing with R imperative. To ease the process of writing parallel programs in R, code transformation from a sequential program to a parallel version would bring much convenience to R users. In this paper, we present our work in semi-automatic parallelization of R codes with user-added OpenMP-style pragmas. While such pragmas are used at the frontend, we take advantage of multiple parallel backends with different R packages. We provide flexibility for importing parallelism with plug-in components, impose built-in MapReduce for data processing, and also maintain code reusability. We illustrate the advantage of the on-the-fly mechanisms which can lead to significant applications in data-centered parallel computing.

Lei Jiang | George Ostrouchov | Ferdinand Jamitzky | Pragneshkumar B. Patel

[1] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[2] Rajiv Gupta,et al. SpiceC: scalable parallelism via implicit copying and explicit commit , 2011, PPoPP '11.

[3] Dirk Eddelbuettel,et al. Rcpp: Seamless R and C++ Integration , 2011 .

[4] Michael Klemm,et al. JCudaMP: OpenMP/Java on CUDA , 2010, IWMSE '10.

[5] Kunle Olukotun,et al. A domain-specific approach to heterogeneous parallelism , 2011, PPoPP '11.

[6] Hao Yu,et al. State of the Art in Parallel Computing with R , 2009 .