Acceleration of a 2D-FFT on an Adaptable Computing Cluster

Despite a decade of research into their use for computing applications, FPGA-based custom computing machines are still only used to accelerate a limited range of applications. Recognizing that recent advances in network technology provide an opportunity for a more general-purpose application of custom computing machines, we develop the idea of an intelligent network adapter for cluster-based parallel computing, calling the resulting architecture an Adaptable Computing Cluster. Results presented suggest that placing the FPGAs in the data path to the network dramatically improves the performance and scalability of target applications. This is especially noteworthy because the target applications have historically not performed well on either technology. This paper discusses how FPGAs can be used to provide network functionality while increasing compute power. The focus is on a specific application, the 2D Fast Fourier Transform, with additional insights into the implications for parallel computing on a cluster.

[1]  Karsten Schwan,et al.  Supporting parallel applications on clusters of workstations: The intelligent network interface approach , 1997, Proceedings. The Sixth IEEE International Symposium on High Performance Distributed Computing (Cat. No.97TB100183).

[2]  Laurent Moll,et al.  Sepia: scalable 3D compositing using PCI Pamette , 1999, Seventh Annual IEEE Symposium on Field-Programmable Custom Computing Machines (Cat. No.PR00375).

[3]  Mark Jones,et al.  Implementing an API for distributed adaptive computing systems , 1999, Seventh Annual IEEE Symposium on Field-Programmable Custom Computing Machines (Cat. No.PR00375).

[4]  A. Dandails An Adaptive Cryptographic Engine for IPsec Architectures , 2000 .

[5]  Chun-Chao Yeh,et al.  Design and implementation of a multicomputer interconnection network using FPGAs , 1995, Proceedings IEEE Symposium on FPGAs for Custom Computing Machines.

[6]  Thorsten von Eicken,et al.  U-Net: a user-level network interface for parallel and distributed computing , 1995, SOSP.

[7]  Steven G. Johnson,et al.  FFTW: an adaptive software architecture for the FFT , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[8]  Hiroshi Harada,et al.  The design and evaluation of high performance communication using a Gigabit Ethernet , 1999, ICS '99.

[9]  John W. Lockwood,et al.  Field programmable port extender (FPX) for distributed routing and queuing , 2000, FPGA '00.

[10]  J. Larus,et al.  Tempest and Typhoon: user-level shared memory , 1994, Proceedings of 21 International Symposium on Computer Architecture.

[11]  Seth Copen Goldstein,et al.  Active Messages: A Mechanism for Integrated Communication and Computation , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.

[12]  Brian N. Bershad,et al.  SPINE: a safe programmable and integrated network environment , 1998, ACM SIGOPS European Workshop.

[13]  Babak Falsafi,et al.  Scheduling communication on an SMP node parallel machine , 1997, Proceedings Third International Symposium on High-Performance Computer Architecture.

[14]  Yvonne Coady,et al.  Using embedded network processors to implement global memory management in a workstation cluster , 1999, Proceedings. The Eighth International Symposium on High Performance Distributed Computing (Cat. No.99TH8469).

[15]  Peter M. Athanas,et al.  Implementation and evaluation of a prototype reconfigurable router , 1999, Seventh Annual IEEE Symposium on Field-Programmable Custom Computing Machines (Cat. No.PR00375).

[16]  Peter Steenkiste,et al.  Fine grain parallel communication on general purpose LANs , 1996, ICS '96.

[17]  Patrick W. Dowd,et al.  An FPGA-based coprocessor for ATM firewalls , 1997, Proceedings. The 5th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Cat. No.97TB100186).

[18]  David E. Culler,et al.  An Implementation and Analysis of the Virtual Interface Architecture , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[19]  Charles L. Seitz,et al.  Myrinet: A Gigabit-per-Second Local Area Network , 1995, IEEE Micro.

[20]  Andrea C. Arpaci-Dusseau,et al.  Parallel computing on the berkeley now , 1997 .