Lazy parallelization: a finite state machine based optimization approach for data parallel image processing applications

Performance obtained with existing library-based parallelization tools for implementing high performance image processing applications is often sub-optimal. This is because inter-operation optimization (or: optimization across library calls) is often not incorporated in the library implementations. This paper presents a simple, efficient, finite state machine-based method for global performance optimization, called 'lazy parallelization'. Experimental results based on this approach show significant performance improvements over non-optimized parallel implementations.

[1]  Mounir Hamdi,et al.  Parallel Image Processing Applications on a Network of Workstations , 1995, Parallel Comput..

[2]  Cristina Nicolescu,et al.  A Data and Task Parallel Image Processing Environment , 2001, PVM/MPI.

[3]  Jack J. Dongarra,et al.  Automated empirical optimizations of software and the ATLAS project , 2001, Parallel Comput..

[4]  Frank J. Seinstra,et al.  User Transparent Parallel Image Processing , 2003 .

[5]  Steven G. Johnson,et al.  FFTW: an adaptive software architecture for the FFT , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[6]  Yuefan Deng,et al.  New trends in high performance computing , 2001, Parallel Computing.

[7]  Michael W. Berry,et al.  Parallelization of the Hoshen-Kopelman Algorithm Using a Finite State Machine , 1997, Int. J. High Perform. Comput. Appl..

[8]  守屋 悦朗,et al.  J.E.Hopcroft, J.D. Ullman 著, "Introduction to Automata Theory, Languages, and Computation", Addison-Wesley, A5変形版, X+418, \6,670, 1979 , 1980 .

[9]  Dennis Koelma,et al.  P-3PC: A Point-to-Point Communication Model for Automatic and Optimal Decomposition of Regular Domain Problems , 2002, IEEE Trans. Parallel Distributed Syst..

[10]  Cristina Nicolescu,et al.  A data and task parallel image processing environment , 2002, Parallel Comput..

[11]  José M. F. Moura,et al.  Fast Automatic Generation of DSP Algorithms , 2001, International Conference on Computational Science.

[12]  John R. Gilbert,et al.  Generating local addresses and communication sets for data-parallel programs , 1993, PPOPP '93.

[13]  Juan Li,et al.  A software environment for parallel computer vision , 1992, Computer.

[14]  Rin-ichiro Taniguchi,et al.  Software platform for parallel image processing and computer vision , 1997, Optics & Photonics.

[15]  Dennis Koelma,et al.  Software architecture for application-driven high-performance image processing , 1997, Optics & Photonics.

[16]  D UllmanJeffrey,et al.  Introduction to automata theory, languages, and computation, 2nd edition , 2001 .

[17]  Peter M. Maurer Logic simulation using networks of state machines , 2000, DATE '00.

[18]  Dennis Koelma,et al.  A software architecture for user transparent parallel image processing , 2002, Parallel Comput..

[19]  Dennis Koelma,et al.  User transparency: a fully sequential programming model for efficient data parallel image processing , 2004, Concurr. Pract. Exp..

[20]  Danny Crookes,et al.  A PVM Implementation of a Portable Parallel Image Processing Library , 1996, PVM.

[21]  Robert L. Stevenson,et al.  Toolkit for parallel image processing , 1998, Optics & Photonics.

[22]  Manuela M. Veloso,et al.  Learning to Generate Fast Signal Processing Implementations , 2001, ICML.

[23]  P. P. Jonkerb,et al.  A Software Architecture for Application Driven High Performance Image Processing , 1997 .

[24]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[25]  Message Passing Interface Forum MPI: A message - passing interface standard , 1994 .

[26]  Zoran Jovanovic,et al.  A finite state machine based format model of software pipelined loops with conditions , 2001 .

[27]  Message P Forum,et al.  MPI: A Message-Passing Interface Standard , 1994 .

[28]  David E. Bernholdt,et al.  A performance optimization framework for compilation of tensor contraction expressions into parallel , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.