LogGP: incorporating long messages into the LogP model—one step closer towards a realistic model for parallel computation

We present a new model of parallel computation---the LogGP model---and use it to analyze a number of algorithms, most notably, the single node scatter (one-to-all personalized broadcast). The LogGP model is an extension of the LogP model for parallel computation which abstracts the communication of fixed-sized short messages through the use of four parameters: the communication latency (L), overhead (o), bandwidth (g), and the number of processors (P). As evidenced by experimental data, the LogP model can accurately predict communication performance when only short messages are sent (as on the CM-5). However, many existing parallel machines have special support for long messages and achieve a much higher bandwidth for long messages compared to short messages (e.g., IBM SP-2, Paragon, Meiko CS-2, Ncube/2). We extend the basic LogP model with a linear model for long messages. This combination, which we call the LogGP model of parallel computation, has one additional parameter, G, which captures the bandwidth obtained for long messages. Experimental data collected on the Meiko CS-2 shows that this simple extension of the LogP model can quite accurately predict communication performance for both short and long messages. This paper discusses algorithm design and analysis under the new model, examining the all-to-all remap, FFT, and radix sort. We also examine, in more detail, the single node scatter problem. We derive solutions for this problem and prove their optimality under the LogGP model. These solutions are qualitatively different from those obtained under the simpler LogP model, reflecting the importance of capturing long messages in a model.

[1]  Steven Fortune,et al.  Parallelism in random access machines , 1978, STOC.

[2]  Lawrence Snyder,et al.  Type architectures, shared memory, and the corollary of modest potential , 1986 .

[3]  Guy E. Blelloch,et al.  Scans as Primitive Parallel Operations , 1989, ICPP.

[4]  Mihalis Yannakakis,et al.  Towards an architecture-independent analysis of parallel algorithms , 1990, STOC '88.

[5]  Yousef Saad,et al.  Data Communication in Hypercubes , 1989, J. Parallel Distributed Comput..

[6]  Phillip B. Gibbons A more practical PRAM model , 1989, SPAA '89.

[7]  Alok Aggarwal,et al.  On communication latency in PRAM computations , 1989, SPAA '89.

[8]  John N. Tsitsiklis,et al.  Parallel and distributed computation , 1989 .

[9]  Richard Cole,et al.  The APRAM: incorporating asynchrony into the PRAM model , 1989, SPAA '89.

[10]  S. Lennart Johnsson,et al.  Optimum Broadcasting and Personalized Communication in Hypercubes , 1989, IEEE Trans. Computers.

[11]  Alok Aggarwal,et al.  Communication Complexity of PRAMs , 1990, Theor. Comput. Sci..

[12]  Mihalis Yannakakis,et al.  Towards an Architecture-Independent Analysis of Parallel Algorithms , 1990, SIAM J. Comput..

[13]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[14]  Richard M. Karp,et al.  Parallel Algorithms for Shared-Memory Machines , 1991, Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity.

[15]  John N. Tsitsiklis,et al.  Optimal Communication Algorithms for Hypercubes , 1991, J. Parallel Distributed Comput..

[16]  F. Leighton,et al.  Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes , 1991 .

[17]  D. Culler,et al.  Active Messages: A Mechanism for Integrated Communication and Computation , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.

[18]  Amotz Bar-Noy,et al.  Designing broadcasting algorithms in the postal model for message-passing systems , 1992, SPAA '92.

[19]  Friedhelm Meyer auf der Heide,et al.  Efficient PRAM simulation on a distributed memory machine , 1992, STOC '92.

[20]  Seth Copen Goldstein,et al.  Active Messages: A Mechanism for Integrated Communication and Computation , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.

[21]  L. Carter,et al.  Towards a Model for Portable Parallel Performance: Exposing the Memory Hierarchy , 1993 .

[22]  T. von Eicken,et al.  Parallel programming in Split-C , 1993, Supercomputing '93.

[23]  Ramesh Subramonian,et al.  LogP: towards a realistic model of parallel computation , 1993, PPOPP '93.

[24]  Richard P. Martin,et al.  Fast parallel sorting under logp: from theory to practice , 1993 .

[25]  Richard M. Karp,et al.  Optimal broadcast and summation in the LogP model , 1993, SPAA '93.

[26]  D. Culler,et al.  Parallel programming in Split-C , 1993, Supercomputing '93. Proceedings.

[27]  Jon Beecroft,et al.  Meiko CS-2 Interconnect Elan-Elite Design , 1994, Parallel Comput..

[28]  James Cownie,et al.  Message Passing on the Meiko CS-2 , 1994, Parallel Comput..

[29]  George Karypis,et al.  Introduction to Parallel Computing , 1994 .

[30]  Paul Pierce The NX Message Passing Interface , 1994, Parallel Comput..

[31]  Michael Schmidt-Voigt Efficient Parallel Communication with the nCUBE 2S Processor , 1994, Parallel Comput..

[32]  Lewis W. Tucker,et al.  CMMD: Active Messages on the CM-5 , 1994, Parallel Comput..

[33]  James R. Larus,et al.  CICO: A Practical Shared-Memory Programming Performance Model , 1994 .

[34]  Jehoshua Bruck,et al.  The IBM External User Interface for Scalable Parallel Systems , 1994, Parallel Comput..

[35]  Jehoshua Bruck,et al.  CCL: a portable and tunable collective communication library for scalable parallel computers , 1994, Proceedings of 8th International Parallel Processing Symposium.

[36]  David E. Culler,et al.  Measurements of Active Messages Performance on the CM-5 , 1994 .

[37]  Chris J. Scheiman,et al.  Experience with active messages on the Meiko CS-2 , 1995, Proceedings of 9th International Parallel Processing Symposium.