The growth in size and performance of field programmable gate arrays (FPGAs) has compelled system-on-a-programmable-chip (SoPC) designers to use soft processors for controlling systems with large numbers of intellectual property (IP) blocks. Soft processors control IP blocks, which are accessed by the processor either as peripheral devices or/and by using custom instructions (CIs). In large systems, chip multiprocessors (CMPs) are used to execute many programs concurrently. When these programs require the use of the same IP blocks which are accessed as peripheral devices, they may have to stall waiting for their turn. In the case of CIs, the FPGA logic blocks that implement the CIs may have to be replicated for each processor. In both of these cases FPGA area is wasted, either by idle soft processors or the replication of CI logic blocks. This paper presents a multithreaded (MT) soft processor for area reduction in SoPC implementations. An MT processor allows multiple programs to access the same IP without the need for the logic replication or the replication of whole processors. We first designed a single-threaded processor that is instruction-set compatible to Altera's Nios II soft processor. Our processor is approximately the same size as the Nios II economy version, with equivalent performance. We augmented our processor to have 4-way interleaved multithreading capabilities. This paper compares the area usage and performance of the MT processor versus two CMP systems, using Altera's and our single-threaded processors, separately. Our results show that we can achieve an area savings of about 45% for the processor itself, in addition to the area savings due to not replicating CI logic blocks
[1]
J. Gregory Steffan,et al.
The microarchitecture of FPGA-based soft processors
,
2005,
CASES '05.
[2]
P. Metzgen.
Optimizing a high performance 32-bit processor for programmable logic
,
2004,
2004 International Symposium on System-on-Chip, 2004. Proceedings..
[3]
Stephen Dean Brown,et al.
Experiences with soft-core processor design
,
2005,
19th IEEE International Parallel and Distributed Processing Symposium.
[4]
Paul Metzgen,et al.
A high performance 32-bit ALU for programmable logic
,
2004,
FPGA '04.
[5]
John Wawrzynek,et al.
Post-placement C-slow retiming for the xilinx virtex FPGA
,
2003,
FPGA '03.
[6]
Trevor Mudge,et al.
MiBench: A free, commercially representative embedded benchmark suite
,
2001
.
[7]
Burton J. Smith.
Architecture And Applications Of The HEP Multiprocessor Computer System
,
1982,
Optics & Photonics.
[8]
Brian N. Bershad,et al.
Characterizing processor architectures for programmable network interfaces
,
2000,
ICS '00.
[9]
Dean M. Tullsen,et al.
Simultaneous multithreading: Maximizing on-chip parallelism
,
1995,
Proceedings 22nd Annual International Symposium on Computer Architecture.
[10]
Allan Porterfield,et al.
The Tera computer system
,
1990,
ICS '90.
[11]
Wayne Luk,et al.
CUSTARD - a customisable threaded FPGA soft processor and tools
,
2005,
International Conference on Field Programmable Logic and Applications, 2005..