A Data-Parallel Formulation for Divide and Conquer Algorithms

This paper presents a general data-parallel formulation for a class of problems based on the divide and conquer strategy. A combination of three techniques—mapping vectors, index-digit permutations and space-filling curves—are used to reorganize the algorithmic dataflow, providing great flexibility to efficiently exploit data locality and to reduce and optimize communications. In addition, these techniques allow the easy translation of the reorganized dataflows into HPF (High Performance Fortran) constructs. Finally, experimental results on the Cray T3E validate our method.

[1]  Shang-Hua Teng,et al.  High performance Fortran for highly irregular problems , 1997, PPOPP '97.

[2]  C. Q. Lee,et al.  The Computer Journal , 1958, Nature.

[3]  Peter M. Flanders A Unified Approach to a Class of Data Movements on an Array Processor , 1982, IEEE Transactions on Computers.

[4]  Anthony J. G. Hey,et al.  An Introduction to High Performance Fortran , 1995, Sci. Program..

[5]  Jaime Seguel,et al.  A Framework for the Design and Implementation of FFT Permutation Algorithms , 2000, IEEE Trans. Parallel Distributed Syst..

[6]  Juan López,et al.  An Efficient Architecture for the In-Place Fast Cosine Transform , 1997, Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors.

[7]  Xiaojing Wang,et al.  A divide-and-conquer method of solving tridiagonal systems on hypercube massively parallel computers , 1991, Proceedings of the Third IEEE Symposium on Parallel and Distributed Processing.

[8]  Y. Meyer,et al.  Wavelets and Filter Banks , 1991 .

[9]  Kenneth E. Batcher,et al.  Sorting networks and their applications , 1968, AFIPS Spring Joint Computing Conference.

[10]  Collin McCurdy,et al.  An evaluation of computing paradigms for N-body simulations on distributed memory architectures , 1999, PPoPP '99.

[11]  Francisco Argüello,et al.  FFTs on Mesh Connected Computers , 1996, Parallel Comput..

[12]  Francisco Argüello,et al.  Mapping Tridiagonal System Algorithms onto Mesh Connected Computers , 1997, Int. J. High Speed Comput..

[13]  Piet Hut,et al.  A hierarchical O(N log N) force-calculation algorithm , 1986, Nature.

[14]  Cyril Fonlupt,et al.  Data-Parallel Load Balancing Strategies , 1998, Parallel Comput..

[15]  C. K. Yuen,et al.  Theory and Application of Digital Signal Processing , 1978, IEEE Transactions on Systems, Man, and Cybernetics.

[16]  Christian Lengauer,et al.  On the Space-Time Mapping of a Class of Divide-and-Conquer Recursions , 1996, Parallel Process. Lett..

[17]  Guy L. Steele,et al.  The High Performance Fortran Handbook , 1993 .

[18]  P. Groen Base- p -cyclic reduction for tridiagonal systems of equations , 1991 .

[19]  Pangfeng Liu,et al.  Experiences with Parallel N-Body Simulation , 2000, IEEE Trans. Parallel Distributed Syst..

[20]  Sergei Gorlatch,et al.  A Generic MPI Implementation for a Data-Parallel Skeleton: Formal Derivation and Application to FFT , 1998, Parallel Process. Lett..

[21]  Corporate Rice University,et al.  High performance Fortran language specification , 1993, FORF.

[22]  J. Tukey,et al.  An algorithm for the machine calculation of complex Fourier series , 1965 .

[23]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[24]  Donald Fraser,et al.  Array Permutation by Index-Digit Permutation , 1976, JACM.

[25]  Francisco Argüello,et al.  Architecture for wavelet packet transform with best tree searching , 2000, Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors.

[26]  R. Hartley A More Symmetrical Fourier Analysis Applied to Transmission Problems , 1942, Proceedings of the IRE.

[27]  Wolfram Schulte,et al.  Architecture Independent Massive Parallelization of Divide-and-Conquer Algorithms , 1995, MPC.