Using an Artificial Neural Network to Predict Loop Transformation Time

Automatic software parallelization is a key issue for high performance computing. There are many algorithms to transform program loop nests to multithreaded code. However, the time of a transformation process is usually unknown, especially for transitive closure based algorithms. The computational complexity of transitive closure calculation algorithms is relatively high and may prevent applying corresponding transformations. The paper presents the prediction of loop transformation time by means of an artificial neural network for the source-to-source TRACO compiler. The analysis of a loop nest structure and dependences is used to estimate the time of TRACO transformations. The training of a Feed-Forward Neural Network is used to make a decision about transformation time. Experiments with various NAS Parallel Benchmarks show promise for the use of neural networks in automatic code parallelization and optimization.

[1]  William Pugh,et al.  Transitive Closure of Infinite Graphs and Its Applications , 2016, International Journal of Parallel Programming.

[2]  William Pugh,et al.  An Exact Method for Analysis of Value-based Array Data Dependences , 1993, LCPC.

[3]  Albert Cohen,et al.  Coarse-Grained Loop Parallelization: Iteration Space Slicing vs Affine Transformations , 2009, 2009 Eighth International Symposium on Parallel and Distributed Computing.

[4]  Ronan Keryell,et al.  Par4All: From Convex Array Regions to Heterogeneous Computing , 2012, HiPEAC 2012.

[5]  Albert Cohen,et al.  Transitive Closures of Affine Integer Tuple Relations and Their Overapproximations , 2011, SAS.

[6]  Uday Bondhugula,et al.  A practical automatic polyhedral parallelizer and locality optimizer , 2008, PLDI '08.

[7]  Anna Beletska,et al.  An Iterative Algorithm of Computing the Transitive Closure of a Union of Parameterized Affine Integer Tuple Relations , 2010, COCOA.

[8]  Wlodzimierz Bielecki,et al.  Using basis dependence distance vectors in the modified Floyd–Warshall algorithm , 2015, J. Comb. Optim..

[9]  Marek Palkowski,et al.  Free scheduling for statement instances of parameterized arbitrarily nested affine loops , 2012, Parallel Comput..

[10]  William Pugh,et al.  The Omega Library interface guide , 1995 .

[11]  Rudolf Eigenmann,et al.  Cetus: A Source-to-Source Compiler Infrastructure for Multicores , 2009, Computer.

[12]  William Pugh,et al.  Iteration space slicing and its application to communication optimization , 1997, ICS '97.

[13]  David H. Bailey,et al.  The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[14]  Yves Robert,et al.  Scheduling and Automatic Parallelization , 2000, Birkhäuser Boston.