Towards automatic parallelization of “for” loops

An effective mechanism to improve the performance of a computing device is to include multiple processing units on a single integrated die. To exploit the performance gain of this mechanism, developing parallel programs is necessary. However, some existing programs are developed for serial execution and manual redesign of all such programs is tedious. Hence, automatic parallelization of existing serial programs is advantageous. One of the methods to execute programs in parallel is to make use of parallel computing platforms. With myriad number of parallel computing platforms, abstracting them from the developer is propitious. In this paper we propose an algorithm which is capable of detecting potential `for' loops in C code that can be parallelized using OpenMP platform. The algorithm proposed ensures the correctness of the program. It performs the required parallelization without post execution analysis, which avoids both execution of code and monitoring of resources accessed by the program.

[1]  Kleanthis Psarris,et al.  The I Test: An Improved Dependence Test for Automatic Parallelization and Vectorization , 1991, IEEE Trans. Parallel Distributed Syst..

[2]  Jessica Koehler Computer Organization And Design The Hardwaresoftware Interface , 2016 .

[3]  S. Akhter,et al.  Multi-core programming , 2006 .

[4]  장훈,et al.  [서평]「Computer Organization and Design, The Hardware/Software Interface」 , 1997 .

[5]  Wei Wu,et al.  Exploiting Parallelism by Data Dependency Elimination: A Case Study of Circuit Simulation Algorithms , 2013, IEEE Design & Test.

[6]  William Pugh,et al.  A practical algorithm for exact array dependence analysis , 1992, CACM.

[7]  Ronan Keryell,et al.  Par4All: From Convex Array Regions to Heterogeneous Computing , 2012, HiPEAC 2012.

[8]  Johan Simon Seland Multi-Core Programming , 2010 .

[9]  Ilona Bluemke,et al.  C code parallelization with paragraph , 2010, 2010 2nd International Conference on Information Technology, (2010 ICIT).

[10]  Xin-Min Tian,et al.  Intel OpenMP C++/Fortran Compiler for Hyper-Threading Technology: Implementation and Performance , 2002 .

[11]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[12]  Norman P. Jouppi,et al.  Single-ISA heterogeneous multi-core architectures: the potential for processor power reduction , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[13]  Dezsö Sima,et al.  The Design Space of Register Renaming Techniques , 2000, IEEE Micro.

[14]  David B. Skillicorn A taxonomy for computer architectures , 1988, Computer.

[15]  Donald J. Patterson,et al.  Computer organization and design: the hardware-software interface (appendix a , 1993 .

[16]  Kleanthis Psarris,et al.  Discovering Maximum Parallelization Using Advanced Data Dependence Analysis , 2008, 2008 10th IEEE International Conference on High Performance Computing and Communications.

[17]  Utpal Banerjee,et al.  Dependence analysis for supercomputing , 1988, The Kluwer international series in engineering and computer science.

[18]  Hai Xiang Lin,et al.  Designing parallel sparse matrix algorithms beyond data dependence analysis , 2001, Proceedings International Conference on Parallel Processing Workshops.

[19]  Luke M. Leslie,et al.  Optimizing Scientific Workflows in the Cloud: A Montage Example , 2014, 2014 IEEE/ACM 7th International Conference on Utility and Cloud Computing.

[20]  Lin Han,et al.  A Compile-Time Cost Model for Automatic OpenMP Decoupled Software Pipelining Parallelization , 2013, 2013 14th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing.