Compiling python modules to native parallel modules using Pythran and OpenMP annotations

Abstract-High Performance Computing users traditionally rely on low-level, compiled language such as C or FORTRAN to perform compute-intensive tasks. As a consequence, it is a common situation to have High Performance Computing application written in a high-level language such as Python, calling native routines for compute-intensive tasks. To improve development speed and reduce maintenance costs, using a higher-level language like Python seems attractive. While it is usually associated with low performance, several solutions such as Cython, Numba, Parakeet or Pythran offer to automatically or semi-automatically turn Python functions into native ones. One of the key points required to match the performance of native applications is the ability to write parallel applications. This paper studies the addition of OpenMP directives, a popular model to describe parallelism in C/C++/FORTRAN applications, to Pythran, an automatic compiler from a subset of Python to C++. It shows that scientific Python applications annotated with OpenMP directives can be turned by an automatic compiler into native applications that run within the same order of magnitude than manually-written ones.

[1]  Eric Jones,et al.  SciPy: Open Source Scientific Tools for Python , 2001 .

[2]  Stefan Behnel,et al.  Cython: The Best of Both Worlds , 2011, Computing in Science & Engineering.

[3]  Guido van Rossum,et al.  A Tour of the Python Language , 1997, TOOLS.

[4]  Cheng Wang,et al.  An OpenMP 3.1 Validation Testsuite , 2012, IWOMP.

[5]  Nagiza F. Samatova,et al.  Automatic Parallelization of Scripting Languages: Toward Transparent Desktop Parallel Computing , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[6]  Kurt Keutzer,et al.  Copperhead: compiling an embedded data parallel language , 2011, PPoPP '11.

[7]  Craig B. Zilles,et al.  Hardware tansactional memory support for lightweight dynamic language evolution , 2006, OOPSLA '06.

[8]  Nicolas Pinto,et al.  PyCUDA and PyOpenCL: A scripting-based approach to GPU run-time code generation , 2009, Parallel Comput..

[9]  Gaël Varoquaux,et al.  The NumPy Array: A Structure for Efficient Numerical Computation , 2011, Computing in Science & Engineering.

[10]  Fuad Tabba Adding concurrency in python using a commercial processor's hardware transactional memory support , 2010, CARN.

[11]  David Abrahams,et al.  Building hybrid systems with Boost.Python , 2003 .

[12]  Alan Edelman,et al.  Parallel MATLAB: Doing it Right , 2005, Proceedings of the IEEE.

[13]  Scott Shenker,et al.  Mostly parallel garbage collection , 1991, PLDI '91.

[14]  Dennis Shasha,et al.  Parakeet: a just-in-time parallel accelerator for python , 2012, HotPar'12.

[15]  Carl Friedrich Bolz,et al.  Tracing the meta-level: PyPy's tracing JIT compiler , 2009, ICOOOLPS@ECOOP.

[16]  Travis E. Oliphant,et al.  Python for Scientific Computing , 2007, Computing in Science & Engineering.

[17]  David M. Beazley,et al.  Automated scientific software scripting with SWIG , 2003, Future Gener. Comput. Syst..

[18]  Mehdi Amini,et al.  Pythran: Enabling Static Optimization of Scientific Python Programs , 2013 .

[19]  Jack J. Dongarra,et al.  High Performance Development for High End Computing With Python Language Wrapper (PLW) , 2007, Int. J. High Perform. Comput. Appl..