P-3PC: A Point-to-Point Communication Model for Automatic and Optimal Decomposition of Regular Domain Problems

One of the most fundamental problems automatic parallelization tools are confronted with is to find an optimal domain decomposition for a given application. For regular domain problems (such as simple matrix manipulations), this task may seem trivial. However, communication costs in message-passing programs often depend significantly on the memory layout of data blocks to be transmitted. As a consequence, straightforward domain decompositions may be non-optimal. In this paper, we introduce a new point-to-point communication model, called P-3PC (Parameterized model based on the Three Paths of Communication), that is specifically designed to overcome this problem. In comparison with related models (e.g. LogGP), P-3PC is similar in complexity, but more accurate in many situations. Although the model is aimed at MPI's standard point-to-point operations, it is applicable to similar message-passing definitions as well. The effectiveness of the model is tested in a framework for automatic parallelization of low-level image processing applications. Experiments are performed on two Beowulf-type systems, each having a different interconnection network and a different MPI implementation. The results show that, where other models frequently fail, P-3PC correctly predicts the communication costs related to any type of domain decomposition.

[1]  Aad J. van der Steen,et al.  A Performance Analysis of the SGI Origin2000 , 1998, VECPAR.

[2]  Lionel M. Ni,et al.  Performance evaluation of some MPI implementations on workstation clusters , 1994, Proceedings Scalable Parallel Libraries Conference.

[3]  Oliver A. McBryan,et al.  An Overview of Message Passing Environments , 1994, Parallel Comput..

[4]  Dennis Koelma,et al.  P-3PC: A Simple and Accurate Model of Point-to-Point Communication , 2000 .

[5]  Zoltan Johasz An Analytical Method for Predicting the Performance of Parallel Image Processing Operations , 2004, The Journal of Supercomputing.

[6]  Francisco Tirado,et al.  A Review of Regular Domain Partitioning , .

[7]  Susanne E. Hambrusch,et al.  C3: A Parallel Model for Coarse-Grained Machines , 1996, J. Parallel Distributed Comput..

[8]  Jack Dongarra,et al.  PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing , 1995 .

[9]  William F. McColl,et al.  Scalability, portability and predictability: The BSP approach to parallel programming , 1996, Future Gener. Comput. Syst..

[10]  Message Passing Interface Forum MPI: A message - passing interface standard , 1994 .

[11]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[12]  Csaba Andras Moritz,et al.  LoGPC: Modeling Network Contention in Message-Passing Programs , 2001, IEEE Trans. Parallel Distributed Syst..

[13]  Francisco Tirado,et al.  Data Locality Exploitation in the Decomposition of Regular Domain Problems , 2000, IEEE Trans. Parallel Distributed Syst..

[14]  Ramesh Subramonian,et al.  LogP: towards a realistic model of parallel computation , 1993, PPOPP '93.

[15]  Chris J. Scheiman,et al.  LogGP: incorporating long messages into the LogP model—one step closer towards a realistic model for parallel computation , 1995, SPAA '95.

[16]  Peter M. A. Sloot,et al.  The distributed ASCI Supercomputer project , 2000, OPSR.

[17]  Amotz Bar-Noy,et al.  Designing broadcasting algorithms in the postal model for message-passing systems , 1992, SPAA '92.

[18]  Henri E. Bal,et al.  LFC: A Communication Substrate for Myrinet , 1998 .

[19]  Anthony Skjellum,et al.  A High-Performance, Portable Implementation of the MPI Message Passing Interface Standard , 1996, Parallel Comput..

[20]  G ValiantLeslie A bridging model for parallel computation , 1990 .

[21]  Csaba Andras Moritz,et al.  LoGPC: modeling network contention in message-passing programs , 1998, SIGMETRICS '98/PERFORMANCE '98.

[22]  Huai Zhang,et al.  Performance evaluation of some MPI implementations on workstation clusters , 2000, Proceedings Fourth International Conference/Exhibition on High Performance Computing in the Asia-Pacific Region.

[23]  Dennis Koelma,et al.  A Software Architecture for User Transparent Parallel Image Processing on MIMD Computers , 2001, Euro-Par.

[24]  Yong Yan,et al.  Latency Metric: An Experimental Method for Measuring and Evaluating Parallel Program and Architecture Scalability , 1994, J. Parallel Distributed Comput..

[25]  F. J. Seinstra,et al.  Modeling Performance of Low Level Image Processing Routines on MIMD Computers , 1999 .

[26]  Rudy Lauwereins,et al.  On the Design and Implementation of Broadcast and Global Combine Operations Using the Postal Model , 1996, IEEE Trans. Parallel Distributed Syst..

[27]  Dennis Koelma,et al.  A software architecture for user transparent parallel image processing , 2002, Parallel Comput..

[28]  Michael Mikolajczak,et al.  Designing And Building Parallel Programs: Concepts And Tools For Parallel Software Engineering , 1997, IEEE Concurrency.

[29]  Arnold W. M. Smeulders,et al.  A Minimum Cost Approach for Segmenting Networks of Lines , 2001, International Journal of Computer Vision.

[30]  Lin Sun,et al.  Semi-Empirical Multiprocessor Performance Predictions , 1996, J. Parallel Distributed Comput..

[31]  Alex Rapaport,et al.  Mpi-2: extensions to the message-passing interface , 1997 .