A Design Methodology for Data-Parallel Applications

A methodology for the design and development of data-parallel applications and components is presented. Data-parallelism is a well understood form of parallel computation, yet developing simple applications can involve substantial efforts to express the problem in low level notations. We describe a process of software development for data-parallel applications starting from high level specifications, generating repeated refinements of designs to match different architectural models and performance constraints, enabling a development activity with cost benefit analysis. Primary issues are algorithm choice, correctness, and efficiency, followed by data decomposition, load balancing, and message passing coordination. Development of a data-parallel multitarget tracking application is used as a case study, showing the progression from high to low level refinements. We conclude by describing tool support for the process.

[1]  Dennis Gannon,et al.  HPC++: experiments with the parallel standard template library , 1997, ICS '97.

[2]  G. Agha Concurrent object-oriented programming , 1990, CACM.

[3]  Daniel W. Palmer,et al.  Efficient execution of nested data-parallel programs , 1996 .

[4]  Sandeep K. S. Gupta,et al.  Synthesizing efficient out-of-core programs for block recursive algorithms using block-cyclic data distributions , 1996, Proceedings of the 1996 ICPP Workshop on Challenges for Parallel Processing.

[5]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[6]  Jack Dongarra,et al.  A User''s Guide to PVM Parallel Virtual Machine , 1991 .

[7]  Geoffrey C. Fox,et al.  Common runtime support for high-performance parallel languages parallel compiler runtime consortium , 1993, Supercomputing '93. Proceedings.

[8]  Richard Barrett,et al.  Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods , 1994, Other Titles in Applied Mathematics.

[9]  Manuel M. T. Chakravarty,et al.  On the Distribution Implementation of Aggregate Data Structures by Program Transformation , 1999, IPPS/SPDP Workshops.

[10]  J. A. Roecker A class of near optimal JPDA algorithms , 1994 .

[11]  A. J. Hey,et al.  Portability and Performance for Parallel Processing , 1994 .

[12]  CORPORATE Parallel Compiler Runtime Consortium Common runtime support for high-performance parallel languages , 1993, Supercomputing '93.

[13]  Ian T. Foster,et al.  Designing and building parallel programs - concepts and tools for parallel software engineering , 1995 .

[14]  William Gropp,et al.  Skjellum using mpi: portable parallel programming with the message-passing interface , 1994 .

[15]  James Riely,et al.  Specification and Development of Parallel Algorithms with the Proteus System , 1994, Specification of Parallel Algorithms.

[16]  Willy Zwaenepoel,et al.  OpenMP on Networks of Workstations , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[17]  N. K. Bose,et al.  Multitarget tracking in clutter: fast algorithms for data association , 1993 .

[18]  James C. Browne,et al.  Visual programming and debugging for parallel computing , 1995, IEEE Parallel Distributed Technol. Syst. Appl..

[19]  Daniel W. Palmer,et al.  Transforming high-level data-parallel programs into vector operations , 1993, PPOPP '93.

[20]  Barry W. Boehm,et al.  A spiral model of software development and enhancement , 1986, Computer.

[21]  Jürg Nievergelt,et al.  The parallel search bench ZRAM and its applications , 1999, Ann. Oper. Res..

[22]  Guy E. Blelloch,et al.  Implementation of a portable nested data-parallel language , 1993, PPOPP '93.

[23]  Lars Nyland,et al.  An Introduction to Proteus, Version 0.9 , 1995 .

[24]  Guy E. Blelloch,et al.  Programming parallel algorithms , 1996, CACM.

[25]  Laxmikant V. Kalé,et al.  Prioritization in Parallel Symbolic Computing , 1992, Parallel Symbolic Computing.

[26]  Robert H. Halstead,et al.  Parallel Symbolic Computing , 1986, Computer.

[27]  Guy E. Blelloch,et al.  NESL: A Nested Data-Parallel Language (Version 2.6) , 1993 .

[28]  Lars Nyland,et al.  Software issues in high-performance computing and a framework for the development of HPC applications , 1994 .

[29]  Yike Guo,et al.  Enlarging the scope of vector-based computations: extending Fortran 90 by nested data parallelism , 1997, Proceedings. Advances in Parallel and Distributed Computing.

[30]  J. A. Roecker,et al.  Suboptimal joint probabilistic data association , 1993 .

[31]  K. M. Chandy Concurrent program archetypes , 1994, Proceedings Scalable Parallel Libraries Conference.

[32]  Shirley Browne Cross-Platform Parallel Debugging and Performance Analysis Tools , 1998, PVM/MPI.

[33]  Laxmikant V. Kalé,et al.  CHARM++: a portable concurrent object oriented system based on C++ , 1993, OOPSLA '93.

[34]  Ambros Marzetta,et al.  ZRAM: a library of parallel search algorithms and its use in enumeration and combinatorial optimization , 1998 .

[35]  Richard M. Karp,et al.  A randomized parallel branch-and-bound procedure , 1988, STOC '88.

[36]  Lei Wang,et al.  Achieving Scalable Parallel Molecular Dynamics Using Dynamic Spatial Domain Decomposition Techniques , 1997, J. Parallel Distributed Comput..

[37]  Guy E. Blelloch,et al.  Implementation of a portable nested data-parallel language , 1993, PPOPP '93.

[38]  Ian Foster,et al.  Designing and building parallel programs , 1994 .

[39]  John W. Backus,et al.  Can programming be liberated from the von Neumann style?: a functional style and its algebra of programs , 1978, CACM.

[40]  Guy E. Blelloch,et al.  Compiling Collection-Oriented Languages onto Massively Parallel Computers , 1990, J. Parallel Distributed Comput..

[41]  N. K. Bose,et al.  An efficient algorithm for data association in multitarget tracking , 1995 .

[42]  Ramesh Subramonian,et al.  LogP: towards a realistic model of parallel computation , 1993, PPOPP '93.

[43]  Guy E. Blelloch,et al.  NESL: A Nested Data-Parallel Language , 1992 .

[44]  Jack Dongarra,et al.  Environments and Tools for Parallel Scientific Computing , 1993 .

[45]  Sandeep K. S. Gupta,et al.  Generating Efficient Programs for Two-Level Memories from Tensor-products , 1995, Parallel and Distributed Computing and Systems.

[46]  John D. Ramsdell,et al.  Techniques for Real-Time Parallel Processing: Sensor Processing Case Studies , 1994 .

[47]  David C. Luckham,et al.  Partial orderings of event sets and their application to prototyping concurrent, timed systems , 1993, J. Syst. Softw..

[48]  Anne Rogers,et al.  Process decomposition through locality of reference , 1989, PLDI '89.

[49]  John H. Reif,et al.  A Refinement Methodology for Developing Data-Parallel Applications , 1996, Euro-Par, Vol. I.

[50]  W BoehmBarry A Spiral Model of Software Development and Enhancement , 1988 .