Parallel Astronomical Data Processing with Python: Recipes for multicore machines

Abstract High performance computing has been used in various fields of astrophysical research. But most of it is implemented on massively parallel systems (supercomputers) or graphical processing unit clusters. With the advent of multicore processors in the last decade, many serial software codes have been re-implemented in parallel mode to utilize the full potential of these processors. In this paper, we propose parallel processing recipes for multicore machines for astronomical data processing. The target audience is astronomers who use Python as their preferred scripting language and who may be using PyRAF/IRAF for data processing. Three problems of varied complexity were benchmarked on three different types of multicore processors to demonstrate the benefits, in terms of execution time, of parallelizing data processing tasks. The native multiprocessing module available in Python makes it a relatively trivial task to implement the parallel code. We have also compared the three multiprocessing approaches—Pool/Map, Process/Queue and Parallel Python. Our test codes are freely available and can be downloaded from our website.

[1]  Alexander S. Szalay,et al.  Extreme Data-Intensive Scientific Computing , 2011, Computing in Science & Engineering.

[2]  Darryl Gove,et al.  Multicore Application Programming: for Windows, Linux, and Oracle Solaris , 2010 .

[3]  G. Amdhal,et al.  Validity of the single processor approach to achieving large scale computing capabilities , 1967, AFIPS '67 (Spring).

[4]  G.E. Moore,et al.  Cramming More Components Onto Integrated Circuits , 1998, Proceedings of the IEEE.

[5]  Thomas P. Robitaille,et al.  HYPERION: an open-source parallelized three-dimensional dust continuum radiative transfer code , 2011, 1112.1071.

[6]  Toshikazu Ebisuzaki,et al.  GRAPE-4: A Massively Parallel Special-Purpose Computer for Collisional N-Body Simulations , 1997 .

[7]  Magdalena Balazinska,et al.  Astronomy in the Cloud: Using MapReduce for Image Co-Addition , 2010, ArXiv.

[8]  Robert Strzodka,et al.  Scientific computation for simulations on programmable graphics hardware , 2005, Simul. Model. Pract. Theory.

[9]  Mario A. Storti,et al.  MPI for Python: Performance improvements and MPI-2 extensions , 2008, J. Parallel Distributed Comput..

[10]  Simon Portegies Zwart,et al.  High-performance direct gravitational N-body simulations on graphics processing units , 2007, astro-ph/0702058.

[11]  Brian Vinter,et al.  pupyMPI - MPI Implemented in Pure Python , 2011, EuroMPI.

[12]  Brian Vinter,et al.  Three Unique Implementations of Processes for PyCSP , 2009, CPA.

[13]  Rohit Chandra,et al.  Parallel programming in openMP , 2000 .

[14]  David M. Beazley,et al.  Python Essential Reference , 1999 .

[15]  Kenneth John Mighell,et al.  CRBLASTER: A Parallel-Processing Computational Framework for Embarrassingly Parallel Image-Analysis Algorithms , 2010, 1008.2192.