Accurate Performance Models of Parallel Low Level Image Processing Operations Based on a Simple A

This paper presents performance models for accurate prediction of the execution time of a large set of image processing operations, executed on distributed memory MIMD-style parallel hardware. The models incorporate a strict separation between the two (essentially unrelated) aspects of parallel execution: (1) sequential computation and (2) inter-process communication. The model parameters are based on the Abstract Parallel Image Processing Machine (APIPM) and its related instruction set, both introduced in this paper. The APIPM is designed to reeect common hardware characteristics and typical behavior of the class of parallel machines under consideration. We apply the performance models in an infrastructure that enables transparent development of high-performance image processing applications. The infrastructure's main component is a software library containing parallel versions of many common low level image processing operations. To hide the parallelism from the user, whilst retaining eeciency of execution on a range of parallel machines, it is essential for the library operations to be self-optimizing. This process of optimization is guided by the performance models described in this paper. Experiments show that for realistic image processing operations performance predictions are highly accurate. These results suggest that the models form a powerful basis for automatic parallelization and optimization of complete image processing applications.

[1]  Cherri M. Pancake,et al.  Do parallel languages respond to the needs of scientific programmers? , 1990, Computer.

[2]  Alan Jay Smith,et al.  Machine Characterization Based on an Abstract High-Level Language Machine , 1989, IEEE Trans. Computers.

[3]  Alan Jay Smith,et al.  Analysis of benchmark characteristics and benchmark performance prediction , 1996, TOCS.

[4]  Bruce M. Maggs,et al.  Proceedings of the 28th Annual Hawaii International Conference on System Sciences- 1995 Models of Parallel Computation: A Survey and Synthesis , 2022 .

[5]  Dan Hammerstrom,et al.  Image processing using one-dimensional processor arrays , 1996, Proc. IEEE.

[6]  Yossi Matias,et al.  Can shared-memory model serve as a bridging model for parallel computation? , 1997, SPAA '97.

[7]  Rin-ichiro Taniguchi,et al.  Software platform for parallel image processing and computer vision , 1997, Optics & Photonics.

[8]  Danny Crookes,et al.  A DAP-Based Implementation of a Portable Parallel Image Processing Machine , 1992, CONPAR.

[9]  Pieter P. Jonker,et al.  A Comparison of Linear Processor Arrays for Image Processing , 1998, MVA.

[10]  Amotz Bar-Noy,et al.  Designing broadcasting algorithms in the postal model for message-passing systems , 2005, Mathematical systems theory.

[11]  Lionel M. Ni,et al.  Performance evaluation of some MPI implementations on workstation clusters , 1994, Proceedings Scalable Parallel Libraries Conference.

[12]  Zoltan Johasz An Analytical Method for Predicting the Performance of Parallel Image Processing Operations , 2004, The Journal of Supercomputing.

[13]  Gregory V. Wilson,et al.  Parallel Programming Using C , 1996 .

[14]  Proceedings of the 8th Annual ACM Symposium on Parallel Algorithms and Architectures, SPAA '96, Padua, Italy, June 24-26, 1996 , 1996, SPAA.

[15]  K. Mani Chandy,et al.  Computer Systems Performance Modeling , 1981 .

[16]  S. Sitharama Iyengar,et al.  Introduction to parallel algorithms , 1998, Wiley series on parallel and distributed computing.

[17]  Michael Mikolajczak,et al.  Designing And Building Parallel Programs: Concepts And Tools For Parallel Software Engineering , 1997, IEEE Concurrency.

[18]  Arnold W. M. Smeulders,et al.  ScilImage: A multi-layered environment for use and development of image processing software , 1994 .

[19]  D. Skillicom Architecture-independent parallel computation , 1990 .

[20]  William F. McColl,et al.  Scalability, portability and predictability: The BSP approach to parallel programming , 1996, Future Gener. Comput. Syst..

[21]  Quentin F. Stout,et al.  Predicting Algorithm Performance , 1999 .

[22]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[23]  Lionel M. Ni,et al.  Performance Metrics and Measurement Techniques of Collective Communication Services , 1997, CANPC.

[24]  Joseph JáJá,et al.  On combining technology and theory in search of a parallel computation model , 1996, 1996 Proceedings ICPP Workshop on Challenges for Parallel Processing.

[25]  Richard Cole,et al.  The APRAM: incorporating asynchrony into the PRAM model , 1989, SPAA '89.

[26]  John W. Boyse,et al.  A Straightforward Model for Computer Performance Prediction , 1975, CSUR.

[27]  Danny Crookes,et al.  Achieving Portability and Efficiency Through Automatic Optimisation: An Investigation in Parallel Image Processing , 1998, Euro-Par.

[28]  Lionel M. Ni,et al.  Construction of optimal multicast trees based on the parameterized communication model , 1996, Proceedings of the 1996 ICPP Workshop on Challenges for Parallel Processing.

[29]  Susanne E. Hambrusch Models for Parallel Computation , 1996, ICPP Workshop.

[30]  Ahmed Saoudi,et al.  Optimal Parallel Algorithms for Multidimensional Image Template Matching and Pattern Matching , 1992, ICPIA.

[31]  Alok Aggarwal,et al.  Communication Complexity of PRAMs , 1990, Theor. Comput. Sci..

[32]  Joseph N. Wilson,et al.  Handbook of computer vision algorithms in image algebra , 1996 .

[33]  Danny Crookes,et al.  A high level language for parallel image processing , 1994, Image Vis. Comput..

[34]  Ramesh Subramonian,et al.  LogP: towards a realistic model of parallel computation , 1993, PPOPP '93.

[35]  Wojciech Rytter,et al.  Note on Two-Dimensional Pattern Matching by Optimal Parallel Algorithms , 1992, ICPIA.

[36]  Chris J. Scheiman,et al.  LogGP: incorporating long messages into the LogP model—one step closer towards a realistic model for parallel computation , 1995, SPAA '95.

[37]  Jack Dongarra,et al.  Computer benchmarking: paths and pitfalls , 1987 .

[38]  Marco Ajmone Marsan,et al.  Performance models of multiprocessor systems , 1987, MIT Press series in computer systems.

[39]  Mounir Hamdi,et al.  Parallel Image Processing Applications on a Network of Workstations , 1995, Parallel Comput..

[40]  Juan Li,et al.  A software environment for parallel computer vision , 1992, Computer.

[41]  Danny Crookes,et al.  A PVM Implementation of a Portable Parallel Image Processing Library , 1996, PVM.

[42]  Rudy Lauwereins,et al.  On the Design and Implementation of Broadcast and Global Combine Operations Using the Postal Model , 1996, IEEE Trans. Parallel Distributed Syst..

[43]  Leonard J. Shustek,et al.  An instruction timing model of CPU performance , 1998, ISCA '98.

[44]  Roger C. Wood,et al.  Performance Analysis of a Multiprogrammed Computer System , 1975, IBM J. Res. Dev..

[45]  D. Crookes,et al.  A high level FPGA-based abstract machine for image processing , 1999, J. Syst. Archit..

[46]  Steven Fortune,et al.  Parallelism in random access machines , 1978, STOC.

[47]  Yossi Matias,et al.  The QRQW PRAM: accounting for contention in parallel algorithms , 1994, SODA '94.