Usage of the TRACO Compiler for Neural Network Parallelization

Artificial neural networks (ANNs) are used often to solve a wide variety of problems using high performance computing. The paper presents automatic loop parallelization for selected ANNs programs by means of the TRACO compiler that permits us to extract loop dependences and produce synchronization-free slices including loop statement instances. Coarse-grained parallelism of nested program loops is obtained by creating a thread of computations on each processor to be executed independently. Program loops of recurrent and back-propagation networks are analysed. The speed-up and efficiency of parallel programs produced by means of TRACO are studied. Related compilers and ANNs parallelization techniques are considered. Future work is outlined.

[1]  Albert Cohen,et al.  Coarse-Grained Loop Parallelization: Iteration Space Slicing vs Affine Transformations , 2009, 2009 Eighth International Symposium on Parallel and Distributed Computing.

[2]  Victor E. Malyshkin,et al.  Parallel computing technologies , 2011, The Journal of Supercomputing.

[3]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[4]  Monica S. Lam,et al.  An affine partitioning algorithm to maximize parallelism and minimize communication , 1999, ICS '99.

[5]  Paul Feautrier,et al.  Some efficient solutions to the affine scheduling problem. Part II. Multidimensional time , 1992, International Journal of Parallel Programming.

[6]  Victor G. Tsaregorodtsev Parallel Implementation of Back-Propagation Neural Network Software on SMP Computers , 2005, PaCT.

[7]  William Pugh,et al.  An Exact Method for Analysis of Value-based Array Data Dependences , 1993, LCPC.

[8]  David Wonnacott A Retrospective of the Omega Project , 2010 .

[9]  E. Aarts,et al.  Boltzmann machines for travelling salesman problems , 1989 .

[10]  George Dahl,et al.  Parallelizing neural network training for cluster systems , 2008 .

[11]  Udo Seiffert,et al.  Artificial Neural Networks on Massively Parallel Computer Hardware , 2004, ESANN.

[12]  Uday Bondhugula,et al.  A practical automatic polyhedral parallelizer and locality optimizer , 2008, PLDI '08.

[13]  Lyle N. Long,et al.  Scalable Massively Parallel Artificial Neural Networks , 2005, J. Aerosp. Comput. Inf. Commun..

[14]  Rudolf Eigenmann,et al.  Cetus: A Source-to-Source Compiler Infrastructure for Multicores , 2009, Computer.

[15]  William Pugh,et al.  Iteration space slicing and its application to communication optimization , 1997, ICS '97.

[16]  William Pugh,et al.  The Omega Library interface guide , 1995 .

[17]  Cédric Bastoul,et al.  Code generation in the polyhedral model is easier than you think , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..

[18]  Laurene V. Fausett,et al.  Fundamentals Of Neural Networks , 1994 .

[19]  Paul Feautrier,et al.  Some efficient solutions to the affine scheduling problem. I. One-dimensional time , 1992, International Journal of Parallel Programming.