A generic multi-accelerator system comprises a microprocessor unit that schedules the accelerators along with the necessary data movements. The system, having the processor as control unit, encounters multiple delays (memory and task management) which degrade the overall system performance. This performance degradation demands an efficient memory manager and high speed scheduler, which feeds prearranged data to the appropriate accelerator. In this work we propose the integration of an efficient scheduler and an intelligent memory manger into an existing core known as PPMC (Programmable Pattern based Memory Controller), such that data movement and computational tasks can be handled proficiently. Consequently, the modified PPMC system improves performance by managing data movements and address generation in hardware and scheduling accelerators without the intervention of a control processor nor an operating system. The PPMC system is evaluated with six memory intensive accelerators: Laplacian solver, FIR, FFT, Thresholding, Matrix Multiplication and 3D-Stencil. This modified PPMC system is implemented and tested on a Xilinx ML505 evaluation FPGA board. The performance of the system is compared with a microprocessor based system that has been integrated with the Xilkernel operating system. Results show that the modified PPMC based multi-accelerator system consumes 50% less hardware resources, 32% less on-chip power and achieves approximately a 27× speed-up compared to the MicroBlaze-based system.
[1]
S. Chai,et al.
Stream Memory Subsystem in Reconfigurable Platforms
,
2005
.
[2]
Martin Burtscher,et al.
Efficient emulation of hardware prefetchers via event-driven helper threading
,
2006,
2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[3]
Wei Wu,et al.
On-Chip Memory System Optimization Design for the FT64 Scientific Stream Accelerator
,
2008,
IEEE Micro.
[4]
A dynamic scheduler for balancing HPC applications
,
2008,
2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[5]
Sascha Uhrig,et al.
RTOS Support for Parallel Execution of Hard Real-Time Applications on the MERASA Multi-core Processor
,
2010,
2010 13th IEEE International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing.
[6]
Eduard Ayguadé,et al.
PPMC: A Programmable Pattern Based Memory Controller
,
2012,
ARC.