Summary form only given. The architecture of modern FPGAs contain over one thousand small memory banks, over five hundred 4k-bit memory banks, and over one hundred thousand logic elements. This inherent parallelism of an FPGA makes it an ideal platform for a multiprocessor architecture. In addition to embedded memory, numerous ASIC multipliers are embedded into the FPGA architecture. This paper introduces a single-instruction-multiple-data (SIMD) system comprised of 2, 4, 8, 16, 32, 64 and 88 processing elements that are built around the ASIC multipliers and controlled by a central instruction stream. In addition to the function of the ASIC multiplier, we have augmented each PE with "custom instructions" to show how the instruction set can be extended. The 88 processors SIMD design utilizes 100% of the DSP blocks available in the Altera Stratix EPS80F1508C6 device, but only 17% of the look-up table logic, which leaves 83% of the logic cells available for custom instructions.
[1]
Michael J. Flynn,et al.
Some Computer Organizations and Their Effectiveness
,
1972,
IEEE Transactions on Computers.
[2]
David Reed,et al.
An SoC solution for massive parallel processing
,
2002,
Proceedings 16th International Parallel and Distributed Processing Symposium.
[3]
Henry G. Dietz,et al.
VLIW across multiple superscalar processors on a single chip
,
1997,
Proceedings 1997 International Conference on Parallel Architectures and Compilation Techniques.
[4]
Christos A. Papachristou,et al.
A VLIW architecture based on shifting register files
,
1993,
Proceedings of the 26th Annual International Symposium on Microarchitecture.
[5]
Vicki H. Allan,et al.
Efficient scheduling of fine grain parallelism in loops
,
1993,
MICRO 1993.