Parallel techniques in irregular codes: cloth simulation as case of study

When parallelizing irregular applications on ccNUMA machines several issues should be taken into account in order to achieve high code performance. These factors include locality exploitation and parallelism, as well as careful use of memory resources (memory overhead). An important number of numerical simulation codes are clear examples of irregular applications. Frequently these kinds of codes include reduction operations in their core, so that an important fraction of the computational time is spent on such operations. Specifically, cloth simulation belongs to this class of applications, being a topic of increasing interest in diverse areas, like in the multimedia industry. Moreover, when real time simulation is the aim, its parallelization becomes an important option. This paper discusses and compares different irregular reduction parallelization techniques on ccNUMA share memory machines. Broadly speaking, we may classify them into two groups: privatization-based and data partitioning-based methods. In this paper we describe a framework, based on data affinity, that permits to develop various algorithms inside the group of the data partitioning-based techniques. All these techniques and approaches are analyzed and adapted to the computational structure of a real, physically based, cloth simulator.

[1]  David B. Loveman High performance Fortran , 1993, IEEE Parallel & Distributed Technology: Systems & Applications.

[2]  Ken Kennedy,et al.  Compiler Analysis for Irregular Problems in Fortran D , 1992, LCPC.

[3]  Sanjay Ranka,et al.  Memory hierarchy management for iterative graph structures , 1998, Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing.

[4]  Hans P. Zima High Performance Fortran - History, Status and Future , 2002, ISHPC.

[5]  Nadia Magnenat-Thalmann,et al.  Versatile and efficient techniques for simulating cloth and other deformable objects , 1995, SIGGRAPH.

[6]  Piyush Mehrotra,et al.  High Performance Fortran: History, Status and Future , 1998, Parallel Comput..

[7]  E. L. Zapata,et al.  Approaching Real-Time Cloth Simulation Using Parallelism , 2007 .

[8]  Ken Kennedy,et al.  Improving memory hierarchy performance for irregular applications , 1999, ICS '99.

[9]  Larry Carter,et al.  Localizing non-affine array references , 1999, 1999 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00425).

[10]  Andrew P. Witkin,et al.  Large steps in cloth simulation , 1998, SIGGRAPH.

[11]  Rudolf Eigenmann,et al.  Idiom recognition in the Polaris parallelizing compiler , 1995, ICS '95.

[12]  Yunheung Paek,et al.  Parallel Programming with Polaris , 1996, Computer.

[13]  Ken Kennedy,et al.  Improving cache performance in dynamic applications through data and computation reorganization at run time , 1999, PLDI '99.

[14]  Robert van Engelen,et al.  Graph Partitioning for High Performance Scienti c Simulations , 2000 .

[15]  Chau-Wen Tseng,et al.  Improving Locality for Adaptive Irregular Scientific Codes , 2000, LCPC.

[16]  Chau-Wen Tseng,et al.  Efficient compiler and run-time support for parallel irregular reductions , 2000, Parallel Comput..

[17]  Piyush Mehrotra High Performance FORTRAN , 1994 .

[18]  Emilio L. Zapata,et al.  Fast Cloth Simulation with Parallel Computers , 2000, Euro-Par.

[19]  Emilio L. Zapata,et al.  On Automatic Parallelization of Irregular Reductions on Scalable Shared Memory Systems , 1999, Euro-Par.

[20]  Geoffrey C. Fox,et al.  Runtime Support and Compilation Methods for User-Specified Irregular Data Distributions , 1995, IEEE Trans. Parallel Distributed Syst..

[21]  Emilio L. Zapata,et al.  A compiler method for the parallel execution of irregular reductions in scalable shared memory multiprocessors , 2000, ICS '00.

[22]  Lawrence Rauchwerger,et al.  Adaptive reduction parallelization techniques , 2000, ICS '00.

[23]  David A. Padua,et al.  On the Automatic Parallelization of Sparse and Irregular Fortran Programs , 1998, LCR.

[24]  Emilio L. Zapata,et al.  Data partitioning‐based parallel irregular reductions , 2004, Concurr. Comput. Pract. Exp..

[25]  A. Gibbons Algorithmic Graph Theory , 1985 .