Compilation Techniques for Out-of-Core Parallel Computations

Abstract The difficulty of handling out-of-core data limits the performance of supercomputers as well as the potential of the parallel machines. Since writing an efficient out-of-core version of a program is a difficult task and virtual memory systems do not perform well on scientific computations, we believe that there is a clear need for compiler directed explicit I/O approach for out-of-core computations. In this paper, we first present an out-of-core compilation strategy based on a disk storage abstraction. Then, we offer a compiler algorithm to optimize locality of disk accesses in out-of-core codes by choosing a good combination of file layouts on disks and loop transformations. We introduce memory coefficient and processor coefficient concepts to characterize the behavior of out-of-core programs under different memory constraints. We also enhance our algorithm to handle data-parallel programs which contain multiple loop nest. Our initial experimental results obtained on IBM SP-2 and Intel Paragon provide encouraging evidence that our approach is successful at optimizing programs which depend on disk-resident data in distributed-memory machines.

[1]  Wei Li,et al.  Unifying data and control transformations for distributed shared-memory machines , 1995, PLDI '95.

[2]  Sivan Toledo,et al.  The design and implementation of SOLAR, a portable library for scalable out-of-core linear algebra computations , 1996, IOPADS '96.

[3]  Peter J. Denning,et al.  The working set model for program behavior , 1968, CACM.

[4]  Monica S. Lam,et al.  A data locality optimizing algorithm , 1991, PLDI '91.

[5]  Wei Li,et al.  Compiling for NUMA Parallel Machines , 1993 .

[6]  Ken Kennedy,et al.  Compiling Fortran D for MIMD distributed-memory machines , 1992, CACM.

[7]  Mahmut T. Kandemir,et al.  Data access reorganizations in compiling out-of-core data parallel programs on distributed memory machines , 1997, Proceedings 11th International Parallel Processing Symposium.

[8]  Kishor S. Trivedi On the Paging Performance of Array Algorithms , 1977, IEEE Transactions on Computers.

[9]  H GornishEdward,et al.  Compiler-directed data prefetching in multiprocessors with memory hierarchies , 1990 .

[10]  Alok N. Choudhary,et al.  A Framework for Integrated Communication and I/O Placement , 1996, Euro-Par, Vol. I.

[11]  Carla Schlatter Ellis,et al.  Characterizing parallel file-access patterns on a large-scale multiprocessor , 1995, IPPS.

[12]  David Kotz,et al.  Disk-directed I/O for MIMD multiprocessors , 1994, OSDI '94.

[13]  Ken Kennedy,et al.  Automatic data layout for distributed-memory machines , 1998, TOPL.

[14]  Guy L. Steele,et al.  The High Performance Fortran Handbook , 1993 .

[15]  Anoop Gupta,et al.  Design and evaluation of a compiler algorithm for prefetching , 1992, ASPLOS V.

[16]  Laszlo A. Belady,et al.  A Study of Replacement Algorithms for Virtual-Storage Computer , 1966, IBM Syst. J..

[17]  L. C. Smith PASSION Runtime Library for Parallel I/O , 1994 .

[18]  David Kotz Microprocessor file system interfaces , 1993 .

[19]  Todd C. Mowry,et al.  Automatic compiler-inserted I/O prefetching for out-of-core applications , 1996, OSDI '96.

[20]  Duncan H. Lawrie,et al.  On the Performance Enhancement of Paging Systems Through Program Analysis and Transformations , 1981, IEEE Transactions on Computers.

[21]  Thomas H. Cormen,et al.  ViC*: A Preprocessor for Virtual-Memory C* , 1994 .

[22]  Michael Wolfe,et al.  High performance compilers for parallel computing , 1995 .

[23]  Mahmut Kandemir,et al.  A Unified Tiling Approach for Out-Of-Core Computations , 1996 .

[24]  Alok N. Choudhary,et al.  Automatic optimization of communication in compiling out-of-core stencil codes , 1996, ICS '96.

[25]  Dror G. Feitelson,et al.  Overview of the MPI-IO Parallel I/O Interface , 1996, Input/Output in Parallel and Distributed Computer Systems.

[26]  Ken Kennedy,et al.  Compiler support for out-of-core arrays on parallel machines , 1995, Proceedings Frontiers '95. The Fifth Symposium on the Frontiers of Massively Parallel Computation.

[27]  Alok N. Choudhary,et al.  Compilation and Communication Strategies for Out-of-Core Programs on Distributed Memory Machines , 1996, J. Parallel Distributed Comput..

[28]  Alexander V. Veidenbaum,et al.  Compiler-directed data prefetching in multiprocessors with memory hierarchies , 1990, ICS '90.

[29]  Alok Choudhary,et al.  PASSION Runtime Library for parallel I/O , 1994, Proceedings Scalable Parallel Libraries Conference.

[30]  Janak H. Patel,et al.  Compiler directed memory management policy for numerical programs , 1985, SOSP 1985.

[31]  David Kotz,et al.  Multiprocessor file system interfaces , 1993, [1993] Proceedings of the Second International Conference on Parallel and Distributed Information Systems.

[32]  LiWei,et al.  Unifying data and control transformations for distributed shared-memory machines , 1995 .

[33]  A. C. McKellar,et al.  The organization of matrices and matrix operations in a paged multiprogramming environment , 1968 .

[34]  David Kotz,et al.  Integrating Theory and Practice in Parallel File Systems , 1993 .

[35]  Chau-Wen Tseng,et al.  Improving data locality with loop transformations , 1996, TOPL.

[36]  Edward G. Coffman,et al.  Organizing matrices and matrix operations for paged memory systems , 1969, Commun. ACM.

[37]  Janak H. Patel,et al.  Compiler Directed Memory Management Policy For Numerical Programs , 1985, SOSP.

[38]  Rajesh R. Bordawekar,et al.  Techniques for compiling i/o intensive parallel programs , 1996 .