RAPTOR - A Scalable Platform for Rapid Prototyping and FPGA-based Cluster Computing

A number of FPGA-based rapid prototyping systems for ASIC emulation and hardware acceleration have been developed in recent years. In this paper we present a prototyping system with distinct flexibility and scalability. The designs will be described from an architectural view and measurements of the communication infrastructure will be presented. Additionally, the properties of the system will be shown using examples, that can be scaled from a single-FPGA-implementation to a multi-FPGA, cluster based implementation. Introduction In the process of developing microelectronic systems, a fast and reliable methodology for the realization of new architectural concepts is of vital importance. Prototypical implementations help to convert new ideas into products quickly and efficiently. Furthermore, they allow for the development of hardware and software for a given application in parallel, thus shortening time to market. FPGA-based hardware emulation can be used for functional verification of new MPSoC architectures as well as for HW/SW co-verification and for design-space exploration [1,2,3]. The rapid prototyping systems of the RAPTOR family that have been developed in the System and Circuit Technology group in Paderborn during the last ten years, provide the user with a complete hardware and software infrastructure for ASIC and MPSoC prototyping. A distinctive feature of the RAPTOR systems is that the platform can be easily scaled from the emulation of small embedded systems to the emulation of large MPSoCs with hundreds of processors. 1. RAPTOR-X64 – A Platform for Rapid Prototyping of Embedded Systems The rapid prototyping system RAPTOR-X64, successor of RAPTOR2000 [4], integrates all key components to realize circuit and system designs with a complexity of up to 200 million transistors. Along with rapid prototyping, the system can be used to accelerate 1This work was partly supported by the Collaborative Research Center 614 – Self-Optimizing Concepts and Structures in Mechanical Engineering – University of Paderborn. computationally intensive applications and to perform partial dynamic reconfiguration of Xilinx FPGAs. RAPTOR-X64 is designed as a modular rapid-prototyping system: the base system offers communication and management facilities, which are used by a variety of extension modules, realizing application-specific functionality. For hardware emulation, FPGA modules equipped with the latest Xilinx FPGAs and dedicated memory are used. Prototyping of complete SoCs is enabled by various additional modules providing, e.g., communication interfaces (Ethernet, USB, FireWire, etc.) as well as analog and digital I/Os. The local bus and the broadcast bus, both embedded in the baseboard architecture, add up to a powerful communication infrastructure that guarantees high speed communication with the host system and between individual modules, as depicted in figure 1. Furthermore, direct links between neighboring modules can be used to exchange data with a bandwidth of more than 20 GBit/s. For communication with the host system, either a PCI-X interface or an integrated USB-2.0 interface can be used. Both interfaces are directly connected to the local bus, thus creating both a closely coupled, high speed, PCI-X based communication, or a loosely coupled, USB based communication. As configuration and application data can either be supplied directly from the host system or stored on a compact flash card, standalone operation is also supported. Therefore, the system is especially suitable for infield evaluation and test of embedded applications. In addition to these features, RAPTORX64 offers several diagnostic functions: besides monitoring of the digital system environment (e.g., status of the communication system), relevant environmental information like voltages and temperatures are recorded. All system clocks are fine-grain adjustable over the whole working range, allowing for running hardware applications at ideal speed. The latest FPGA module that is currently available for RAPTOR-X64 (called DB-V4) hosts a Xilinx Virtex-4 FX100 FPGA and 4 GByte DDR2 RAM (see figure 1). The FPGAs include two embedded PowerPC processors and 20 serial highspeed transceivers, each capable of transceiving 6.5 GBit/s in full duplex. Utilizing these transceivers, four copper-based data links with a throughput of up to 32.5 GBit/s each are realized on the DB-V4 module. By adapting the cabling between the modules, the communication topology can be changed without affecting the communication via the RAPTOR base system. Serial data transmission at data rates of up to 6.5 GBit/s necessitates techniques to maintain signal integrity between the FPGAs. Utilizing all integrated signal integrity features of the FPGA and providing a sophisticated PCB environment SelectMAP, CFG-JTAG SelectMAP, CFG-JTAG SelectMAP, CFG-JTAG CTRL+Config Logic Arbiter, MMU Diagnostics, CLK, Configuration, etc. P C IX B us PCI-BusBridge Master, Slave, DMA Local-Bus (32Bit Data / 32Bit Address) Dual-Port SRAM 85 CTRL, SMB 85 CTRL, SMB 85 CTRL, SMB