Simple Parallel Statistical Computing in R

Theoretically, many modern statistical procedures are trivial to parallelize. However, practical deployment of a parallelized implementation which is robust and reliably runs on different computational cluster configurations and environments is far from trivial. We present a framework for the R statistical computing language that provides a simple yet powerful programming interface to a computational cluster of CPUs. This interface allows the rapid development of R functions that distribute independent computations across the nodes of the computational cluster. The approach can be extended to finer grain parallelization if needed. The resulting framework allows statisticians to obtain significant speed-ups for some computations at little additional development cost. The particular implementation can be deployed in ad-hoc heterogeneous computing environments.

[1]  Luca Cardelli,et al.  A language with distributed scope , 1995, POPL '95.

[2]  Ian Foster,et al.  The Grid 2 - Blueprint for a New Computing Infrastructure, Second Edition , 1998, The Grid 2, 2nd Edition.

[3]  Ross Ihaka,et al.  Lexical Scope and Statistical Computing , 2000 .

[4]  Gaetan Hains,et al.  An SPMD environment machine for functional BSP programs , 2001 .

[5]  Tamara G. Kolda,et al.  Asynchronous Parallel Pattern Search for Nonlinear Optimization , 2001, SIAM J. Sci. Comput..

[6]  C. Geyer,et al.  Annealing Markov chain Monte Carlo with applications to ancestral inference , 1995 .

[7]  Jack Dongarra,et al.  PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing , 1995 .

[8]  Ami Marowka,et al.  Parallel Scientific Computation: A Structured Approach using BSP and MPI , 2006, Scalable Comput. Pract. Exp..

[9]  Konrad Hinsen,et al.  High-Level Scientific Programming with Python , 2002, International Conference on Computational Science.

[10]  Darren J. Wilkinson,et al.  Parallel Bayesian Computation , 2005 .

[11]  P L Miller,et al.  Parallelizing genetic linkage analysis: a case study for applying parallel computation in molecular biology. , 1991, Computers and biomedical research, an international journal.

[12]  Mark J. Schervish Applications of Parallel Computation to Statistical Inference , 1988 .

[13]  Debashis Kushary,et al.  Bootstrap Methods and Their Application , 2000, Technometrics.

[14]  Ross Ihaka,et al.  Gentleman R: R: A language for data analysis and graphics , 1996 .

[15]  Amnon Barak,et al.  The MOSIX multicomputer operating system for high performance cluster computing , 1998, Future Gener. Comput. Syst..

[16]  Xiao-Li Meng,et al.  SIMULATING RATIOS OF NORMALIZING CONSTANTS VIA A SIMPLE IDENTITY: A THEORETICAL EXPLORATION , 1996 .

[17]  D. B. Davis,et al.  Sun Microsystems Inc. , 1993 .

[18]  Pierre L'Ecuyer,et al.  An Object-Oriented Random-Number Package with Many Long Streams and Substreams , 2002, Oper. Res..

[19]  A A Schäffer,et al.  Integrating parallelization strategies for linkage analysis. , 1995, Computers and biomedical research, an international journal.

[20]  Message Passing Interface Forum MPI: A message - passing interface standard , 1994 .

[21]  Friedrich Leisch,et al.  Editorial Porting R to Darwin/x11 and Mac Os X Mac Os X Application Environments User Experience Porting Problems Rpvm: Cluster Statistical Computing in R , 2022 .

[22]  Pierre L'Ecuyer,et al.  rstream: Streams of Random Numbers for Stochastic Simulation , 2005 .

[23]  Frédéric Loulergue,et al.  Functional Bulk Synchronous Parallel Programming using the BSMLlib Library , 2000 .

[24]  A A Schäffer,et al.  Parallelization of general-linkage analysis problems. , 1994, Human heredity.

[25]  Erik A. Hendriks,et al.  BProc: the Beowulf distributed process space , 2002, ICS '02.

[26]  Albrecht Gebhardt,et al.  PVM kriging with R , 2003 .

[27]  Arie Shoshani,et al.  The Grid 2: Blueprint for a New Computing Infrastructure (2nd edition), , 2003 .