A Parallel Butterfly Algorithm

The butterfly algorithm is a fast algorithm which approximately evaluates a discrete analogue of the integral transform $\int_{\mathbb{R}^d} K(x,y) g(y) dy$ at large numbers of target points when the kernel, $K(x,y)$, is approximately low-rank when restricted to subdomains satisfying a certain simple geometric condition. In $d$ dimensions with $O(N^d)$ quasi-uniformly distributed source and target points, when each appropriate submatrix of $K$ is approximately rank-$r$, the running time of the algorithm is at most $O(r^2 N^d \log N)$. A parallelization of the butterfly algorithm is introduced which, assuming a message latency of $\alpha$ and per-process inverse bandwidth of $\beta$, executes in at most $O(r^2 \frac{N^d}{p} \log N + (\beta r\frac{N^d}{p}+\alpha)\log p)$ time using $p$ processes. This parallel algorithm was then instantiated in the form of the open-source \textttDistButterfly library for the special case where $K(x,y)=\exp(i \Phi(x,y))$, where $\Phi(x,y)$ is a black-box, sufficiently smooth...

[1]  J. Watts,et al.  Interprocessor collective communication library (InterCom) , 1994, Proceedings of IEEE Scalable High Performance Computing Conference.

[2]  Rajeev Thakur,et al.  Optimization of Collective Communication Operations in MPICH , 2005, Int. J. High Perform. Comput. Appl..

[3]  GuMing,et al.  Efficient algorithms for computing a strong rank-revealing QR factorization , 1996 .

[4]  D. S. Seljebotn WAVEMOTH–FAST SPHERICAL HARMONIC TRANSFORMS BY BUTTERFLY MATRIX COMPRESSION , 2011, 1110.4874.

[5]  V. Rokhlin Rapid Solution of Integral Equations of Scattering Theory , 1990 .

[6]  Laurent Demanet,et al.  A Fast Butterfly Algorithm for the Computation of Fourier Integral Operators , 2008, Multiscale Model. Simul..

[7]  V. Rokhlin,et al.  A randomized algorithm for the approximation of matrices , 2006 .

[8]  Laurent Demanet,et al.  A Butterfly Algorithm for Synthetic Aperture Radar Imaging , 2012, SIAM J. Imaging Sci..

[9]  Chris J. Scheiman,et al.  LogGP: Incorporating Long Messages into the LogP Model for Parallel Computation , 1997, J. Parallel Distributed Comput..

[10]  Ian T. Foster,et al.  Parallel Algorithms for the Spectral Transform Method , 1997, SIAM J. Sci. Comput..

[11]  Tze Meng Low,et al.  Implementing Level-3 BLAS with BLIS : Early Experience FLAME Working Note # 69 Field , 2013 .

[12]  Daniel Potts,et al.  Parallel Three-Dimensional Nonequispaced Fast Fourier Transforms and Their Application to Particle Simulation , 2013, SIAM J. Sci. Comput..

[13]  E. Michielssen,et al.  A multilevel matrix decomposition algorithm for analyzing scattering from large structures , 1996 .

[14]  Laurent Demanet,et al.  A fast butterfly algorithm for generalized Radon transforms , 2013 .

[15]  Paul D. Gader,et al.  Image algebra techniques for parallel image processing , 1987 .

[16]  T. Chan Rank revealing QR factorizations , 1987 .

[17]  Leslie Greengard,et al.  A fast algorithm for particle simulations , 1987 .

[18]  G. Golub,et al.  Linear least squares solutions by householder transformations , 1965 .

[19]  Robert A. van de Geijn,et al.  Elemental: A New Framework for Distributed Memory Dense Matrix Computations , 2013, TOMS.

[20]  Michael Pippig PFFT: An Extension of FFTW to Massively Parallel Architectures , 2013, SIAM J. Sci. Comput..

[21]  Lexing Ying,et al.  Sparse Fourier Transform via Butterfly Algorithm , 2008, SIAM J. Sci. Comput..

[22]  Ramesh Subramonian,et al.  LogP: towards a realistic model of parallel computation , 1993, PPOPP '93.

[23]  Laurent Demanet,et al.  Fast Computation of Fourier Integral Operators , 2006, SIAM J. Sci. Comput..

[24]  Robert A. van de Geijn,et al.  Collective communication: theory, practice, and experience , 2007, Concurr. Comput. Pract. Exp..

[25]  James Demmel,et al.  Minimizing Communication in Numerical Linear Algebra , 2009, SIAM J. Matrix Anal. Appl..

[26]  Per-Gunnar Martinsson,et al.  A fast direct solver for scattering problems involving elongated structures , 2007, J. Comput. Phys..

[27]  James Demmel,et al.  Communication Avoiding Rank Revealing QR Factorization with Column Pivoting , 2015, SIAM J. Matrix Anal. Appl..

[28]  Lexing Ying,et al.  Fast Directional Multilevel Algorithms for Oscillatory Kernels , 2007, SIAM J. Sci. Comput..

[29]  Ilse C. F. Ipsen,et al.  On Rank-Revealing Factorisations , 1994, SIAM J. Matrix Anal. Appl..

[30]  Mark Tygert,et al.  Fast algorithms for spherical harmonic expansions, III , 2009, J. Comput. Phys..

[31]  Roger W. Hockney,et al.  The Communication Challenge for MPP: Intel Paragon and Meiko CS-2 , 1994, Parallel Computing.

[32]  Richard W. Vuduc,et al.  On the communication complexity of 3D FFTs and its implications for Exascale , 2012, ICS '12.

[33]  Ming Gu,et al.  Efficient Algorithms for Computing a Strong Rank-Revealing QR Factorization , 1996, SIAM J. Sci. Comput..

[34]  Mark Tygert,et al.  Fast algorithms for spherical harmonic expansions, II , 2008, J. Comput. Phys..

[35]  J.M. Song,et al.  Fast Fourier transform of sparse spatial data to sparse Fourier data , 2000, IEEE Antennas and Propagation Society International Symposium. Transmitting Waves of Progress to the Next Millennium. 2000 Digest. Held in conjunction with: USNC/URSI National Radio Science Meeting (C.

[36]  Stefan Kunis,et al.  A Stable and Accurate Butterfly Sparse Fourier Transform , 2012, SIAM J. Numer. Anal..

[37]  G. Beylkin The inversion problem and applications of the generalized radon transform , 1984 .

[38]  Michael O'Neil,et al.  An algorithm for the rapid evaluation of special function transforms , 2010 .

[39]  V. Rokhlin,et al.  A fast randomized algorithm for the approximation of matrices ✩ , 2007 .