R1 is a domain specific language widely used for data analysis by the statistics community as well as by researchers in finance, biology, social sciences, and many other disciplines. As R programs are linked to input data, the exponential growth of available data makes high-performance computing with R imperative. To ease the process of writing parallel programs in R, code transformation from a sequential program to a parallel version would bring much convenience to R users. In this paper, we present our work in semi-automatic parallelization of R codes with user-added OpenMP-style pragmas. While such pragmas are used at the frontend, we take advantage of multiple parallel backends with different R packages. We provide flexibility for importing parallelism with plug-in components, impose built-in MapReduce for data processing, and also maintain code reusability. We illustrate the advantage of the on-the-fly mechanisms which can lead to significant applications in data-centered parallel computing.
[1]
Sanjay Ghemawat,et al.
MapReduce: Simplified Data Processing on Large Clusters
,
2004,
OSDI.
[2]
Rajiv Gupta,et al.
SpiceC: scalable parallelism via implicit copying and explicit commit
,
2011,
PPoPP '11.
[3]
Dirk Eddelbuettel,et al.
Rcpp: Seamless R and C++ Integration
,
2011
.
[4]
Michael Klemm,et al.
JCudaMP: OpenMP/Java on CUDA
,
2010,
IWMSE '10.
[5]
Kunle Olukotun,et al.
A domain-specific approach to heterogeneous parallelism
,
2011,
PPoPP '11.
[6]
Hao Yu,et al.
State of the Art in Parallel Computing with R
,
2009
.