Algorithm Engineering for Parallel Computation

The emerging discipline of algorithm engineering has primarily focused on transforming pencil-and-paper sequential algorithms into robust, efficient, well tested, and easily used implementations. As parallel computing becomes ubiquitous, we need to extend algorithm engineering techniques to parallel computation. Such an extension adds significant complications. After a short review of algorithm engineering achievements for sequential computing, we review the various complications caused by parallel computing, present some examples of successful efforts, and give a personal view of possible future research.

[1]  Alok Aggarwal,et al.  The input/output complexity of sorting and related problems , 1988, CACM.

[2]  Ben H. H. Juurlink,et al.  A quantitative comparison of parallel computation models , 1996, SPAA '96.

[3]  Friedhelm Meyer auf der Heide,et al.  Parallel Bridging Models and Their Impact on Algorithm Design , 2001, International Conference on Computational Science.

[4]  Sajal K. Das,et al.  Special Issue on Parallel and Distributed Data Structures: Guest Editors' Introduction , 1998, J. Parallel Distributed Comput..

[5]  José Rolim,et al.  Parallel and distributed processing : 10 IPPS/SPDP'98 Workshops held in conjunction with the 12th International Parallel Processing Symposium and 9th Symposium on Parallel and Distrubuted Processing, Orlando, Florida, USA, March 30-April 3, 1998, proceedings , 1998 .

[6]  Friedhelm Meyer auf der Heide,et al.  Priority Queue Operationsand Selection for the BSP * Model , 1996 .

[7]  Torsten Suel,et al.  BSPlib: The BSP programming library , 1998, Parallel Comput..

[8]  David H. Bailey,et al.  Twelve ways to fool the masses when giving performance results on parallel computers , 1991 .

[9]  Peter Sanders,et al.  Accessing Multiple Sequences Through Set Associative Caches , 1999, ICALP.

[10]  Corporate Ieee,et al.  Information Technology-Portable Operating System Interface , 1990 .

[11]  Yong Yan,et al.  Lock bypassing: an efficient algorithm for concurrently accessing priority heaps , 1998, JEAL.

[12]  Marshall C. Yovits,et al.  Ohio State University , 1974, SGAR.

[13]  Rama Chellappa,et al.  Scalable data parallel algorithms for texture synthesis using Gibbs random fields , 1995, IEEE Trans. Image Process..

[14]  Clyde P. Kruskal,et al.  Submachine Locality in the Bulk Synchronous Setting (Extended Abstract) , 1996, Euro-Par, Vol. II.

[15]  Peter Sanders,et al.  Randomized Priority Queues for Fast Parallel Access , 1998, J. Parallel Distributed Comput..

[16]  John H. Reif,et al.  Synthesis of Parallel Algorithms , 1993 .

[17]  Alan E. Charlesworth,et al.  Starfire: extending the SMP envelope , 1998, IEEE Micro.

[18]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[19]  Sandeep Sen,et al.  Towards a theory of cache-efficient algorithms , 2000, SODA '00.

[20]  Guy E. Blelloch,et al.  An Experimental Analysis of Parallel Sorting Algorithms , 1998, Theory of Computing Systems.

[21]  Jack J. Dongarra,et al.  Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[22]  Bernard M. E. Moret,et al.  How to present a paper on experimental work with algorithms , 1999, SIGA.

[23]  SkjellumAnthony,et al.  A high-performance, portable implementation of the MPI message passing interface standard , 1996 .

[24]  David A. Bader,et al.  Kronos : A software system for the processing and retrieval of large-scale AVHRR data sets , 2000 .

[25]  David A. Bader,et al.  Using PRAM Algorithms on a Uniform-Memory-Access Shared-Memory Architecture , 2001, WAE.

[26]  Tao Liu,et al.  Inversion Medians Outperform Breakpoint Medians in Phylogeny Reconstruction from Gene-Order Data , 2002, WABI.

[27]  Larry S. Davis,et al.  Parallel algorithms for image enhancement and segmentation by region growing, with an experimental study , 1996, Proceedings of International Conference on Parallel Processing.

[28]  Andrew Davison,et al.  Twelve Ways to Fool the Masses When Giving Performance Results on Parallel Computers , 1995 .

[29]  Jack Dongarra,et al.  ScaLAPACK Users' Guide , 1987 .

[30]  Henry D. Shapiro,et al.  Algorithms and Experiments: The New (and Old) Methodology , 2001, J. Univers. Comput. Sci..

[31]  David A. Bader,et al.  A Parallel Sorting Algorithm With an Experimental Study , 1998 .

[32]  Peter Sanders,et al.  Efficient Massively Parallel Quicksort , 1997, IRREGULAR.

[33]  Thomas S. Huang,et al.  Image processing , 1971 .

[34]  Susanne E. Hambrusch,et al.  C3: A Parallel Model for Coarse-Grained Machines , 1996, J. Parallel Distributed Comput..

[35]  Message Passing Interface Forum MPI: A message - passing interface standard , 1994 .

[36]  Peter Sanders,et al.  On the Efficient Implementation of Massively Parallel Quicksort , 1997 .

[37]  Jesper Larsson Träff,et al.  SKaMPI: a comprehensive benchmark for public benchmarking of MPI , 2002, Sci. Program..

[38]  Ralf H. Reussner,et al.  SKaMPI: A Detailed, Accurate MPI Benchmark , 1998, PVM/MPI.

[39]  Joseph JáJá,et al.  An Introduction to Parallel Algorithms , 1992 .

[40]  Guy E. Blelloch,et al.  A comparison of sorting algorithms for the connection machine CM-2 , 1991, SPAA '91.

[41]  Eleftherios D. Polychronopoulos,et al.  An Efficient Kernel-Level Scheduling Methodology for Multiprogrammed Shared Memory Multiprocessors , 1999 .

[42]  Naila Rahman,et al.  Adapting Radix Sort to the Memory Hierarchy , 2001, JEAL.

[43]  Joseph JáJá,et al.  Designing Practical Efficient Algorithms for Symmetric Multiprocessors , 1999, ALENEX.

[44]  Scott W. Haney,et al.  Rapid Application Development and Enhanced Code Interoperability using the POOMA Framework , 1998 .

[45]  A BaderDavid,et al.  Practical parallel algorithms for personalized communication and integer sorting , 1996 .

[46]  Frank Mueller,et al.  A Library Implementation of POSIX Threads under UNIX , 1993, USENIX Winter.

[47]  Scott B. Baden,et al.  Run-Time Support for Multi-tier Programming of Block-Structured Applications on SMP Clusters , 1997, ISCOPE.

[48]  Friedhelm Meyer auf der Heide,et al.  Realistic Parallel Algorithms: Priority Queue Operations and Selection for the BSP Model , 1996, Euro-Par, Vol. II.

[49]  James Demmel,et al.  LAPACK Users' Guide, Third Edition , 1999, Software, Environments and Tools.

[50]  Thomas L. Sterling,et al.  BEOWULF: A Parallel Workstation for Scientific Computation , 1995, ICPP.

[51]  Anthony Skjellum,et al.  A High-Performance, Portable Implementation of the MPI Message Passing Interface Standard , 1996, Parallel Comput..

[52]  Rice UniversityCORPORATE,et al.  High performance Fortran language specification , 1993 .

[53]  D.A. Reed,et al.  Scalable performance analysis: the Pablo performance analysis environment , 1993, Proceedings of Scalable Parallel Libraries Conference.

[54]  David A. Bader,et al.  A new deterministic parallel sorting algorithm with an experimental evaluation , 1998, JEAL.

[55]  Message P Forum,et al.  MPI: A Message-Passing Interface Standard , 1994 .

[56]  David A. Bader,et al.  Parallel algorithms for personalized communication and sorting with an experimental study (extended abstract) , 1996, SPAA '96.

[57]  David A. Bader An Improved Randomized Selection Algorithm With an Experimental Study (Extended Abstract) , 1999 .

[58]  Rama Chellappa,et al.  Scalable Data Parallel Algorithms for Texture Synthesis and Compression using Gibbs Random Fields , 1998 .

[59]  Dennis J. Volper,et al.  Geometric retrieval in parallel , 1988 .

[60]  D. Lenoski,et al.  The SGI Origin: A ccnuma Highly Scalable Server , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[61]  Steven G. Johnson,et al.  FFTW: an adaptive software architecture for the FFT , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[62]  David A. Bader,et al.  A Randomized Parallel Sorting Algorithm with an Experimental Study , 1998, J. Parallel Distributed Comput..

[63]  Ramesh Subramonian,et al.  LogP: towards a realistic model of parallel computation , 1993, PPOPP '93.

[64]  Chris J. Scheiman,et al.  LogGP: incorporating long messages into the LogP model—one step closer towards a realistic model for parallel computation , 1995, SPAA '95.

[65]  Jeffrey Scott Vitter,et al.  A Simple and Efficient Parallel Disk Mergesort , 2002, Theory of Computing Systems.

[66]  Olaf Bonorden,et al.  The Paderborn university BSP (PUB) library-design, implementation and performance , 1999, Proceedings 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing. IPPS/SPDP 1999.

[67]  David A. Bader,et al.  Practical parallel algorithms for personalized communication and integer sorting , 1996, JEAL.

[68]  David A. Bader,et al.  High performance computing algorithms for land cover dynamics using remote sensing data , 2000, International Journal of Remote Sensing.

[69]  S. Sitharama Iyengar,et al.  Introduction to parallel algorithms , 1998, Wiley series on parallel and distributed computing.

[70]  Jack Dongarra,et al.  ScaLAPACK: a scalable linear algebra library for distributed memory concurrent computers , 1992, [Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation.

[71]  Benjamin Ray Seyfarth,et al.  How to Build a Beowulf: A Guide to the Implementation and Application of PC Clusters , 2000, Scalable Comput. Pract. Exp..

[72]  Joseph JáJá,et al.  Sorting on clusters of SMPs , 1998, Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing.

[73]  Andrew V. Goldberg,et al.  Combinatorial algorithms test sets CATS: the ACM/EATCS platform for experimental research , 1999, SODA '99.

[74]  Friedhelm Meyer auf der Heide,et al.  Truly Efficient Parallel Algorithms: 1-optimal Multisearch for an Extension of the BSP Model , 1998, Theor. Comput. Sci..

[75]  Uwe Schöning A Probabilistic Algorithm for k-SAT and Constraint Satisfaction Problems , 1999, FOCS.

[76]  David A. Bader,et al.  Parallel Algorithms for Image Histogramming and Connected Components with an Experimental Study , 1996, J. Parallel Distributed Comput..

[77]  David A. Bader,et al.  SIMPLE: A Methodology for Programming High Performance Algorithms on Clusters of Symmetric Multiprocessors (SMPs) , 1998, J. Parallel Distributed Comput..

[78]  W. Daniel Hillis,et al.  The Network Architecture of the Connection Machine CM-5 , 1996, J. Parallel Distributed Comput..

[79]  Eduard Ayguadé,et al.  Is Data Distribution Necessary in OpenMP? , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[80]  Openmp: a Proposed Industry Standard Api for Shared Memory Programming , 2022 .

[81]  Joseph JáJá,et al.  Prefix computations on symmetric multiprocessors , 1999, Proceedings 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing. IPPS/SPDP 1999.

[82]  Richard P. Martin,et al.  Fast parallel sorting under logp: from theory to practice , 1993 .

[83]  David A. Bader,et al.  Practical parallel algorithms for dynamic data redistribution, median finding, and selection , 1995, Proceedings of International Conference on Parallel Processing.

[84]  Miron Livny,et al.  Condor-a hunter of idle workstations , 1988, [1988] Proceedings. The 8th International Conference on Distributed.

[85]  R. C. Whaley,et al.  Automatically Tuned Linear Algebra Software (ATLAS) , 2011, Encyclopedia of Parallel Computing.