Modern top of the line FPGAs can already host hundreds of simple soft-core processors. Because soft-cores often support floating point units through external interfaces this opens the door to explore the convenience for sharing the floating point units among a number of processors in many-soft-cores. We build two variants of a many-soft-core with 16 NIOSII cores to test if sharing the FPU gives an important area reduction and to test if the introduced time overhead is significant. We find out that area savings are a 30% of the non-shared FPU version for a 16 core system and that the overhead in clock cycles is almost inexistent for simple applications like matrix multiplication and below 2% for a parallel Mandelbrot application. However, if we consider the reduction of the maximum operational frequency that happens when the number of processors increase, we get that sharing among 8 processors is a very good option, and that it is not advisable to share among more than 12 processors because of the excessive time overhead
[1]
Stuart F. Oberman,et al.
Floating point division and square root algorithms and implementation in the AMD-K7/sup TM/ microprocessor
,
1999,
Proceedings 14th IEEE Symposium on Computer Arithmetic (Cat. No.99CB36336).
[2]
David Castells-Rufas,et al.
NocMaker: A cross-platform open-source design space exploration tool for networks on chip
,
2009
.
[3]
Norman P. Jouppi,et al.
Conjoined-Core Chip Multiprocessing
,
2004,
37th International Symposium on Microarchitecture (MICRO-37'04).
[4]
John Wawrzynek,et al.
RAMP Blue: A Message-Passing Manycore System in FPGAs
,
2007,
2007 International Conference on Field Programmable Logic and Applications.
[5]
Ansi Ieee,et al.
IEEE Standard for Binary Floating Point Arithmetic
,
1985
.
[6]
David Castells-Rufas,et al.
A NoC-based multi-{soft}core with 16 cores
,
2010,
2010 17th IEEE International Conference on Electronics, Circuits and Systems.